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abstract 

STOCHASTIC  APPROXIMATION  WITH  CORRELATED  DATA 

New  almost  sure  convergence  results  for  a special  form  of  the 
multidimensional  Robbins-Monro  stochastic  approximation  procedure  are 
developed.  The  special  form  treated  is  motivated  by  a consideration  of 
several  algorithms  that  have  been  proposed  for  discrete  time  adaptive 
signal  processing  applications.  Most  of  these  algorithms  can  also  be 
viewed  as  stochastic  gradient-following  algorithms. 

Essentially,  previous  convergence  results  contain  a common 
"conditional  expectation  condition"  which  is  extremely  difficult  (if 
not  impossible)  to  satisfy  when  the  "training  data"  is  a correlated 
sequence.  In  contrast,  the  new  convergence  results  developed  in  the 
present  work  are  easily  applied  to  cases  where  the  "training  data"  is 
heavily  correlated.  In  fact,  the  new  convergence  results  are  appli- 
cable when  certain  moments  exist  and  certain  "decay  rates"  on  two  auto- 
covariance functions  can  be  established.  For  example,  when  the  data 
sequence  is  normal  and  (i)  M-dependent,  (ii)  autoregressive  moving 
average  (ARMA) , or  (iii)  can  be  viewed  as  samples  of  a bandlimited  con- 
tinuous time  process,  the  new  convergence  results  can  be  applied  to 
establish  the  almost  sure  convergence  of  each  algorithm  treated. 

, Several  special  forms  of  data  correlation  matrices  that  are  shown 
to  arise  in  discrete  time  signal  processing  are  examined.  New  com- 
putationally efficient  procedures  are  developed  for  both  the  inversion 
of  a matrix  having  one  of  the  treated  special  forms  and  for  the  solution 
of  a corresponding  set  of  simultaneous  linear  equations.  The  special 
forms  treated  are  termed  Toeplitz  and  block  Toeplitz  matrices. 
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ADDENDUM 


Since  the  completion  of  this  report 


the  author  has  become  aware  of  the 


excellent  paper  by  H.  Akaike,  entitled  "Block  Toeplltz  Matrix  Inversion' 


(SIAM  J.  Appl  Math 


Vol.  24,  March  1973,  pp.  234-241).  Most  of  the 


results  treating  block  Toeplitz  matrices  which  are  developed  in  Chapter 
V of  the  present  work  have  been  developed  by  Akaike.  In  case  the  block 


Toeplitz  matrix  involved  is  both  symmetric  and  persymmetric , a case  which 


arises,  for  example,  when  each  block  of  a symmetric  block  Toeplitz  matrix 


is  a Toeplitz  matrix,  then  the  results  of  the  present  work  provide  a more 


efficient  solution  than  the  results  of  Akaike 
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I.  INTRODUCTION 


Although  stochastic  signal  processing  can  be  viewed  as  a branch  of 
time  series  analysis,  the  desire  to  implement  simple  sequential  real- 
time signal  processing  structures  motivates  one  to  approach  signal 
processing  problems  in  a decidedly  different  manner  than  one  would 


i.^Thl 


approach  related  time  series  problems.  "Tliis  work  is  devoted  to  a unified 
analytical  treatment  of  algorithms  that  have  been  proposed  for  discrete 
time  adaptive  signal  processing.  These  algorithms  are  treated  within 
the  framework  of  the  multidimensional  Robbins-Monro  stochastic  approxi- 
mation procedure.  The  special  form  of  the  Robbins-Monro  procedure  which 
is  treated  herein  and  the  convergence  results  obtained  are  of  interest 
in  their  own  right,  having  applications  outside  the  realm  of  adaptive 
signal  processing.. 


A.  Motivation:  Adaptive  Signal  Processing 

In  many  signal  processing  applications,  the  ultimate  goal  is  to 
provide  an  "optimal"  estimate  of  some  signal  process  which  is  imbedded 
in  an  additive  noise  process.  The  physical  implementation  of  the 
"optimal"  estimator  (or  filter  structure)  requires  that  certain  parame- 
ters of  the  signal  and  noise  processes  be  known.  The  filter  structure 
is  usually  constrained  to  be  a causal,  linear  structure  and  the  opti- 
mality criterion  is  often  minimum  mean  square  error  (MMSE) . For  this 
case,  the  optimal  filter  is  well-known  to  be  the  Wiener  filter  or  the 
Kalman  filter.  These  filters  can  be  implemented  provided  that  the 
required  parameters  are  known.  For  discrete  time  signal  processing 
with  uncorrelated  signal  and  noise  processes,  the  required  parameters 
are  those  which  completely  specify  the  signal  and  the  noise  autocorrela- 
tion sequences.  The  required  parameter  set  may  or  may  not  be  finite. 

1 


r 


In  case  the  required  parameters  are  unknown.  Identification 
techniques  (see  e.g.,  [1],[2])  can  be  used,  at  least  in  some  cases,  to 
estimate  the  desired  parameters.  The  estimated  parameters  can  then  be 
used  to  Implement  the  required  filter.  Due  to  inherent  uncertainties 
in  the  estimated  parameters,  the  performance  of  the  resulting  filter 
can  differ  dramatically  from  the  performance  of  the  desired  optimal 
filter.  A closely  related  approach  is  to  constrain  the  filter  to  have 
a certain  fixed  suboptimal  structure  and  to  estimate  the  corresponding 
family  of  parameters  required  to  implement  the  simpler  structure. 

An  interesting  concept  that  has  evolved  from  the  latter  approach 
is  the  concept  of  an  "adaptive  filter."  The  term  "adaptive  filter" 
is  used  throughout  this  work  to  denote  a filter  which  designs  itself, 
either  from  the  raw  input  data,  or  from  some  training  data.  Many  of  the 
algorithms  used  for  adaptive  signal  processing  are  stochastic  versions 
of  gradient-following  procedures.  Significant  early  contributions  to 
adaptive  signal  processing  were  made  by  Widrow  and  Hoff  [3],  and  by 
Sakrison  [4] . A more  complete  treatment  of  the  relevant  literature  is 
given  in  Chapter  II. 

Primary  considerations  in  the  application  of  adaptive  signal 
processing  techniques  are  the  convergence  properties  of  the  algorithms 
used.  Most  of  the  algorithms  which  have  been  proposed  for  use  in 
adaptive  signal  processing  applications  are  slight  modifications  of 
multidimensional  versions  of  either  the  Robbins-Monro  stochastic 
approximation  procedure  [5]  or  the  Kiefer-Wolfowitz  stochastic  approxi- 
mation procedure  [6].  Unfortunately,  many  proposed  uses  for  adaptive 
signal  processing  involve  processes  for  which  available  convergence 
results  are  Inapplicable. 
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B.  Purpose 

The  purpose  of  the  present  work  Is  to  (1)  establish  a unified 
framework  suitable  for  the  analytical  treatment  of  algorithms  which  have 
been  proposed  for  adaptive  signal  processing  applications,  (ii)  investi- 
gate the  probabilistic  convergence  properties  of  algorithms  which  fall 
within  this  framework,  and  (Hi)  examine  the  detailed  structure  of 
several  special  forms  of  data  correlation  matrices  that  arise  in 
discrete  time  signal  processing  applications. 


jL. . 


C.  Contents  and  Organization 

In  Chapter  II,  several  representative  systems  that  have  been 
proposed  for  adaptive  signal  processing  are  reviewed,  including  systems 
used  for  adaptive  channel  equalization  and  adaptive  array  processing. 


Most  of  the  algorithms  that  are  treated  in  Chapter  II  are  shown  to  fall 
into  a specialized  form  of  the  multidimensional  Robbins-Monro  stochastic 
approximation  procedure. 

Existing  convergence  results  for  the  Robbins-Monro  procedure  are 
examined  in  detail  in  Chapter  III.  The  need  for  additional  analytical 
work  to  establish  meaningful  probabilistic  convergence  for  the  algo- 
rithms treated  in  Chapter  II  is  established. 

In  Chapter  IV,  new  convergence  results  are  developed,  providing 
an  almost  sure  (a.s.)  convergence  proof  for  a certain  family  of  algo- 
rithms under  conditions  which  are  easily  verified.  For  example,  in  the 
normal  case,  when  the  input  signal  and  noise  processes  are  M-dependent, 
or  stable  autoregressive  moving  average  (ARMA)  processes,  or  can  be 
viewed  as  samples  of  bandllmlted  continuous  time  processes,  the  new 
convergence  results  establish  the  almost  sure  convergence  of  each 
member  of  the  family  of  algorithms  treated. 
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In  Chapter  V,  several  special  forms  of  data  correlation  matrices 
that  are  shown  to  arise  in  discrete  time  signal  processing  are  examined. 
New  computationally  efficient  procedures  are  developed  for  both  the 
inversion  of  a matrix  having  one  of  the  treated  special  forms  and  for 
the  solution  of  a corresponding  set  of  simultaneous  linear  equations. 

The  special  forms  treated  are  termed  Toeplitz  and  block  Toeplitz 
matrices.  The  new  procedures  represent  an  efficient  method  for  design- 
ing the  desired  suboptimal  MMSE  filter  in  case  the  required  correlation 
sequence  values  are  known  a priovi. 

A summary  of  new  results  and  suggestions  for  future  work  is 


presented  in  Chapter  VI. 


IT.  SYSTEMS  PROPOSED  FOR  ADAPTIVE  SIGNAL  PROCESSING 


In  this  chapter,  several  systems  which  have  been  proposed  for 
adaptive  signal  processing  applications  are  reviewed.  In  Section  II-A 
the  channel  equalization  problem  is  treated;  Section  II-B  is  devoted 
to  a treatment  of  the  adaptive  array  problem.  The  main  point  to  be 


developed  in  this  chapter  is  that  many  algorithms  proposed  for  adaptive 
signal  processing  fall  into  the  realm  of  "stochastic  gradient-following 
algorithms"  and,  as  such,  the  convergence  properties  of  these  algo- 
rithms may  be  treated  in  a somewhat  unified  manner.  The  literature 
reviewed  here  is  representative  of  the  most  significant  contributions 
in  recent  years  on  the  topic  of  "adaptive  signal  processing." 

A.  Systems  Proposed  for  Adaptive  Channel  Equalization 

In  this  section,  several  systems  which  have  been  proposed  for 
adaptive  channel  equalization  are  reviewed.  The  motivating  problem,  to 
which  these  systems  are  applicable,  is  the  automatic  equalization  of 
voice-grade  telephone  channels  to  reduce  intersymbol  interference,  thus 
enabling  a much  higher  data  rate  for  digital  signal  transmission.  Such 
channels  usually  are  characterized  as  having  a moderately  high  signal- 
to-noise  ratio. 

It  is  assumed  throughout  this  section  that  for  the  equivalent 

baseband  system  at  time  t * kT,  k = 0,1,2,...,  a real-valued  random 

variable  a^  is  transmitted  into  a linear  time-invariant  channel  having 

00 

unit  pulse  response  {h^)^_  m . The  output  of  the  channel  is  corrupted 


The  sequence  (a^)  is  the  informat ion -bearing  sequence  which  is  to  be 
estimated  by  the  output  of  the  adaptive  equalizer.  For  digital  data 
transmission,  a^  is  chosen  from  a set  of  M discrete  amplitudes  via 
some  probabilistic  rule.  It  is  assumed  throughout  that  {a^}  and  (n^) 
are  uncorrelated,  i.e.,  that  E{a^n^}  * Eta^jEfn^}  for  all  Integer  k,i, 
and  that  E{n^}  * 0»  where  E{*}  denotes  statistical  expectation. 

A commonly  used  equalizer  structure  is  a transversal  filter  having 
p adjustable  weights.  Defining  W and  by 


W * (w  w ) , 

12  p 


Xk  = (xk,xk-l,'",xk-p+l)  » 


(2.2) 


where  * denotes  matrix  transpose,  the  output  of  the  transversal  equaliz- 
er can  be  written  as 


?k  = W>Xk 


(2.3) 


Suppose  that  it  is  desired  to  choose  W so  that  y^  is  a "best  esti- 
mate" of  ak_a  for  all  k = o,a+l,...,  and  for  some  fixed  integer  a. 
There  have  been  a number  of  "criteria  of  goodness"  proposed  for  charac- 
terizing the  "best  estimate."  Defining 


K * (hk,hk-l’”-?hk-p+l) 

Nk  = (nk’Vl,,,,,nk-p+l)  ’ 


(2.4) 


the  output  of  the  transversal  equalizer,  y , can  be  expressed  as 


y.  = W’H  (a.  + (W’H  )-1  7 a W’H,  „)  + W’N,  . (2.5) 

k o k-a  a „ u l k-i  k 

i/k-a 
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From  (2.5),  it  can  be  readily  seen  that  the  distortion  due  to 
intersymbol  interference  at  time  t » kT  is  given  by 


tk  - . I W 


J^k-a 


One  easily  obtained  bound  for  1^  is  given  by 


(2.6) 


\I,  \ < B = max  |a  | |w'H  | 1 £ |Wh|  . 


m=-° 

m?*a 


m ’ 


(2.7) 


It  is  noted  that  for  channels  having  severe  intersymbol  interference, 

B may  be  infinite;  however,  in  case  B is  infinite  the  channel  has  the 
interpretation  of  an  unstable  linear  system  in  the  bounded  output  for 
all  bounded  input  sense.  Lucky  [ 7 ] has  considered  automatic  equaliza- 
tion from  the  point  of  view  of  minimizing  B with  respect  to  W subject 

to  the  constraint  that  W'H  = 1.  The  constraint  that  W'H  =1  is 

a a 

convenient  for  digital  detection  in  that  the  decision  regions  (or 
slicing  levels)  can  remain  fixed  under  this  constraint.  The  procedure 
proposed  by  Lucky  [ 7 ] makes  use  of  a sequence  of  isolated  unit  training 
pulses.  Define  D by 


D = l IW’HJ  - I 

m=-°°  m=-°° 

m^a  m^a 


W'H  sgn(W’H  ), 
m m 


(2.8) 


where  sgn(y)  =1  if  y > 0 and  sgn(y)  = -1  if  y < 0.  Noting  that 
(formally) 


*r  ■ l Vi+i  • 

i m=-°° 
mifo 
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and  assuming  that  ==  c6fc  Q,  where  ^ Is  the  Kronecker  delta, 


3D 


— =:  c sgntW’H^)  , 


(2.10) 


for  all  i * l,2,...,p  and  1 i a + 1.  Furthermore,  from  (2.1)  it 

follows  that  for  a£  * ^ q’  Xk  * hk  + 'He  ^ c^k  o + nk*  bence,  ^rom 
(2.3),  ~ W'H^.  Consequently,  Lucky  considers  incrementing  the  weight 

vector,  W,  by  the  following  scheme:  after  each  test  pulse  has  arrived 

increment  w^  by  -p  sgn  y^  ^ for  i j*  o + 1,  and  increment  wa+^  by 
-p  sgn  (y^-1).  The  constant  p > 0 is  termed  the  step  size.  For  chan- 
nels capable  of  supporting  binary  transmission  without  equalization. 


Lucky  shows  that  Iw'H^  is  asymptotically  bounded  by  2p  for  all 


i = l,2,...,p,l  ^o+l,  assuming  an  infinite  signal-to-noise  ratio. 


Similarly,  he  shows  that  lw'Ha  “ l|  *s  asymptotically  bounded  by  2p.  In 


[8],  Lucky  extends  the  results  of  [7]  to  obtain  a decision-directed 
adaptive  equalizer  which  does  not  require  a sequence  of  isolated  train- 
ing pulses  and  can  "track"  slow  time  variations  in  the  channel  charac- 
teristics. Lucky  also  investigates  what  has  since  been  called  the 
"probability  of  a runaway"  for  his  system.  The  equalizers  introduced  by 
Lucky  have  also  been  called  "zero  forcing  equalizers"  in  that  they  tend 
to  force  p - 1 zeros  in  the  overall  unit  pulse  response  WH^. 

Gersho  [ 9 ] has  considered  a scheme  somewhat  similar  in  nature  to 

2 

that  of  Lucky  [ 7 ] . He  considers  minimizing  the  deterministic  1 norm 
of  the  error  sequence.  The  error  sequence  is  the  difference  between  the 
deterministic  part  of  the  equalizer  output  and  the  desired  output, 
assuming  that  a sequence  of  known  isolated  training  pulses  is  being 
sent.  Suppose  for  the  moment  that  n^  in  (2.1)  is  identically  zero. 
Then  with  y^  given  by  (2.3)  and  d^  the  desired  equalizer  output. 


r 


Gersho  ( 9 ] considers  choosing  W to  minimize 


e - I<yt  - d >' 

i 


(2.11) 


Motivated  by  (2.11),  Gersho  considers  minimizing 

.2 


'k ' ,E,  <yt ' d*)' 

teJ. 
k 


(2.12) 


where  it  is  not  assumed  that  n^  = 0,  and  is  an  index  set  defined  by 


J,  - U + kC,  l + k£  + l,...,4  + kC  + k) 

k o o o 


(2.13) 


Gersho  assumes  that  x and  d are  virtually  zero  for  all  l / J, 


for  an  isolated  unit  pulse  sent  at  t = k£,  and  that  £ > k . The 


gradient  of  with  respect  to  W ■ can  be  expressed  as 


vU 


-irk-  -2Fkwk-2Pk  * 


(2.14) 


W*=W, 


where 


I . 


(2.15) 


teJ, 


and 


p,  « y d x . 

k 0 L.  I l 
JteJ, 
k 


(2.16) 


The  resulting  algorithm  for  "training"  the  weight  vector,  W,  is  given  by 


K - v(T.V.  - P.  ) , 


"k+1  — k M v k k ‘k' 


(2.17) 


where  W is  arbitrary  and  p > 0.  It  is  worth  noting  that  for 

O 


R - EtF^},  P * E{Pk>,  w = R~lp  is  the  wel8ht  vector  that  minimizes 


EUk>.  where  is  given  by  (2.12).  Gersho  [9  ] shows  that  for  a 

suitably  small  y > 0,  E{ (Wfc  - R-1P) ' (Wfc  - R_1P)>  can  be  asymptoti 
bounded  by  some  e(u),  where  e(vi)  -►  0 as  y 0.  Furthermore,  he 


■ -•  • r itiliVi  i " i iiftiarti 


i 


1 
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points  out  that  for  Increasingly  large  slgnal-to-noise  ratios, 
R •*  A *P,  where  A is  given  by 


A - l E{X  }E{X! } . 


(2.18) 


Vi " "k  -"V**  - yk> 


(2.22) 


The  weight  vector  W ■ A P characterizes  the  equalizer  structure 
which  will  minimize  the  noise-free  criterion  of  (2.11).  Gersho  also 
discusses  techniques  for  choosing  y to  maximize  the  convergence  rate. 
Niessen  and  Willlm  [10  ] consider  the  minimization  of 


C = E(  lyfc  ~ afc)  ) 


(2.19) 


with  respect  to  W,  assuming  that  (a^)  and  (n^)  are  j°*ntly  wide- 
sense  stationary  and  that  Eta^n^}*©  for  all  k + l.  With  y^  given 
by  (2.3),  the  gradient  of  £ with  respect  to  W is 


V„£  - 2R  W - 2P  , 
W xx 


(2.20) 


where  Rxx  ■ E{X^X^}  and  P * E{a^X^}.  Equations  (2.19)  and  (2.20) 


suggest  the  algorithm 


W.  - n(R  ».  - P)  . 
k xx  k 


(2.21) 


Unfortunately,  since  Rxx  and  P are  assumed  to  be  unknown,  (2.21)  is 
inapplicable.  Consequently,  Niessen  and  Willim  consider  approximating 
Rxx  ky  and  P by  y*X^,  where  y*  is  a quantized  version  of 

y^.  The  quantization  of  y^  is  performed  according  to  the  a priori 
known  possible  discrete  levels  of  (a^).  Substituting  these  approxima- 
tions into  (2.21),  the  following  algorithm  results: 


This  algorithm  represents  what  is  commonly  referred  to  as  a 


"decision-directed  equalizer"  in  that  decisions  which  are  made  about 
a^  (l.e.  y*)  are  used  In  the  algorithm  to  train  the  weight  vector. 

Clearly,  In  order  for  the  algorithm  given  by  (2.22)  to  "converge," 
y*  must  initially  be  a very  reliable  estimate  of  a^  The  conver- 
gence analysis  performed  by  Niessen  and  Wlllim  [10]  Is  essentially 

deterministic  and  assumes  y*  * a, 

k k-a 

In  order  for  the  strategy  represented  by  (2.22)  to  be  useful  for 
moderately  low  signal-to-noise  ratios  (viz.,  less  than  30  dB),  a con- 
straint such  as  used  by  Lucky  [ 7 ],  l.e.,  W'H^  - 1,  seems  to  be  essential. 
It  is  noted  that  the  technique  of  Niessen  and  Willim  does  not  inherently 
require  an  initial  "setup  period"  with  known  isolated  training  pulses, 
and  it  can  be  capable  of  "tracking"  slowly  time-varying  channels. 


George  et  at.  [11]  consider  a decision  feedback  strategy  somewhat  similar 
to  that  of  Niessen  and  Willim  [10],  using  an  adaptive  transversal  filter 
following  the  quantizer.  The  output  of  this  second  transversal  filter 
is  fed  back  into  the  input  of  the  quantizer.  Monsen  [12]  presents  a 
performance  comparison  of  decision  feedback  and  linear  equalizers. 

Schonfeld  and  Schwartz  [13]  consider  the  following  algorithm 


which  is  quite  similar  to  (2.17): 


W_,  - W - a.  (F.  W.  - P,  ) , 

krfl  k k k k k 


(2.23) 


where  F^  and  are  given  by  (2.15)  and  (2.16),  respectively.  With 

R ■ EfF^}  and  p * E^Pk^’  Schon*eld  and  Schwartz  [13]  choose 


2I(XU  + V - <XU  - *t> 


(2.24) 


for  k ■ 0,1 N-l,  where  all  of  the  eigenvalues  of  R are  assumed 


to  be  contained  in  the  interval  [X  ,X  ].  In  [13],  they  show  that  this 

* u 

choice  of  is  °Ptimal  *n  t*ie  minimax  sense  for  minimizing 


E{Wn  - R-1fM  ’K<Wn  - R_1P).  In  f 14| , Schonfold  and  Schwartz  extend  the 


above  philosophy  to  obtain  a second-order  algorithm: 


Wk+1  * Wk  " °k(FkWk  " Pk)  + Bk(Wk  “ “k-l*  ’ 


(2.25) 


where  8 =0  and  {a.}  and  (8.)  are  chosen  to  minimize 

° J J 


E{W^  - R ^P}'E{W^  - R *P}  in  the  minimax  sense  for  all  k.  Both  of 


these  algorithms  ((2.24)  and  (2.25))  force  EtW^)  to  converge  more 


rapidly  than  e.g.,  (2.17).  Consequently,  these  algorithms  seem  to  be 


useful  when  equalizing  high  signal-to-noise  ratio  channels  in  a training 


mode  by  sending  a sequence  of  isolated  known  pulses. 


Kosovych  and  Pickholtz  [15]  consider  a successive  overrelaxation 


technique  for  training  the  weight  vector  of  a transversal  equalizer 


during  a training  period  using  isolated  pulses  for  the  minimization  of 


the  mean-squared  error  E{£k>,  given  by  (2.12).  With  and  P^ 


given  by  (2.15)  and  (2.16),  respectively,  the  overrelaxation  algorithm 


considered  by  Kosovych  and  Pickholtz  is  given  by 


Vi  • \ - “(Dk  - »V'  (Fk“k  - V 


(2.26) 


where  to  > 0 is  a "relaxation  factor,"  and  Efc  are,  respectively. 


diagonal  and  strictly  lower  triangular  matrices,  such  that 


F^  ■ - E*.  Here  E+  is  strictly  upper  triangular,  leaving 


to  be  composed  of  the  diagonal  elements  of  F^.  Denoting  the  ij' 


element  of  a matrix  A by  (A)  , (2.26)  can  be  written  as 

J 


L*itl  " (Vi,l  ~ ^Vi.i^^Vi.j^kM^j.l 

+ j1(,k>i.J0,k>j.1-(Pk>i.1}  * 


(2.27) 
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Note  that  (2.27)  does  not  require  any  matrix  Inversions.  Kosovych  and 
Plckholtz  [15]  discuss  methods  for  choosing  u and  compare  conver- 
gence rates  of  (2.26),  (2.17),  and  (2.25)  via  computer  simulations.  They 
also  obtain  a bound  on  the  asymptotic  mean  square  error  in  W^»  assuming 
that  F , F and  P , P are  independent  for  all  k ^ 

K X.  K X> 

Recalling  that  from  (2.3)  it  is  desired  to  train  the  weight 
vector,  W ,.  of  a transversal  equalizer  to  "optimize"  the  approximation 
y^  ~ Q»  it  is  noted  that  most  of  the  systems  discussed  so  far  have 
assumed  that  p = 2N  + 1 and  a = N.  Qureshi  [16]  presents  an  adaptive 
technique  for  choosing  a and  training  W simultaneously.  Kobayashi 

[17]  presents  a more  general  technique  using  maximum  likelihood  estima- 
tion and  the  Robbins-Monro  stochastic  approximation  procedure  to 

estimate  {a  },  sample  timing,  and  carrier  phase.  Walzman  and  Schwartz 
n 

[ 18 ]  * [19]  present  a discrete  frequency  domain  approach  to  the  adaptive 
transversal  equalizer  problem.  Benedetto  and  Biglieri  [20]  discuss  a 
Kalman  filter  theory  approach  to  the  reduction  of  Intersymbol 
interference. 

Finally,  the  importance  of  the  Viterbi  algorithm  to  sequence 
estimation  for  data  transmitted  over  dispersive  channels  should  be 
noted.  Forney  [21]  introduces  a receiver  structure  consisting  of  a 
whitened  matched  filter,  a symbol-rate  sampler,  and  the  Viterbi  algorithm. 
In  [21]  it  is  shown  that  this  structure  is  a maximum-likelihood 
estimator  of  the  entire  transmitted  sequence.  Qureshi  and  Newhall  [22] 
and  Magee  and  Proakls  [23]  discuss  adaptive  structures  which  make  use  of 
the  Viterbi  algorithm.  Both  of  these  schemes  ([22]  and  [23])  include 
an  adaptive  transversal  filter  having  a weight  vector,  W,  which  is 
trained  by  a stochastic  gradient-following  algorithm. 
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B.  Systems  Proposed  for  Adaptive  Array  Processing 

In  this  section,  several  systems  which  have  been  proposed  for 
adaptive  array  processing  are  reviewed.  Data  from  an  array  of  sensors 
(e.g.,  hydrophones,  seismometers,  or  antennas)  can  be  "optimally" 
processed  to  reject  certain  directional  components  of  the  observed  field 
and  provide  an  estimate  of  some  desired  signal  component  ( e.g.,  [24]- 
[30]).  Adaptive  array  processing  is  used  to  compensate  for  varying 
degrees  of  a prion  statistical  ignorance  in  such  problems. 

Consider  an  array  of  L sensors,  each  sensor  followed  by  a tapped 
delay  line  having  M equally  spaced  taps.  Denote  the  delay  between 
adjacent  taps  on  each  delay  line  by  D,  and  denote  by  x^(t)  the  output 
at  time  t of  the  sensor,  £ = 1,2,...,L.  Define  the  ML  x 1 

matrix  X(t)  by 

X'(t)  - (Xl(t),x2(t) x^t))  , (2.28) 

where  xjjr+(m_i)L(t)  * x^(t  - (m-l)D)  for  all  l * 1,2 L and  for 

all  m - 1,2,...M.  Define  the  ML  x 1 matrix  W (the  so  called  array 
weight  vector)  by 

W'  - (w1,w2,...,wML)  . (2.29) 

The  output  of  the  array  is  given  by 

y(t)  - W*X(t)  . (2.30) 

It  is  assumed  that  X(t)  can  be  expressed  as 

X(t)  - S(t)  + N(t)  , (2.31) 

where  S(t)  is  a vector  of  signal  components  and  N(t)  is  a vector  of 
noise  and/or  interference  components.  For  the  purposes  of  this  section, 
the  goal  of  the  array  processor  design  is  to  choose  the  weight  vector,  W, 
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so  that  y(t)  will  have  certain  desired  properties.  For  example,  W 
might  be  chosen  so  that  y(t)  Is  a minimum  mean-square  error  (MMSE) 
estimate  of  some  desired  signal  component,  d(t). 

Shor  [31]  considers  a simple  stochastic  gradient-following 
technique  to  maximize  an  estimate  of  the  output  signal-to-nolse  ratio 
for  a narrowband  array  processor.  The  technique  given  in  [31]  is 
presented  here  for  the  more  general  array  structure  defined  In  the 
preceding  paragraph.  Define 


soat(t)  ' H'X<t)-  ”out(t>  ’ M'N(t)  • 


1 9 

T / s (t)dt  , 
T (k-l)T  OUt 


1 K1  9 

°k  “ T / nn..r(t)dt  » 

* T (k-l)T  °Ut 


(2.33) 


for  k ■ 1,2,....  In  order  to  maximize  s^/n^,  Shor  considers 


algorithm 


Vi  * wk  + X(sk 


/nt>{v 


I S^>  8„..,(t)dt 


(k-l)T 


/ N(t)  n (t)dt 
°kT  (k-l)T  OUt 


} • 


(2.34) 


for  k - 1,2,...,  where  X > 0.  Shor  advises  using  a "strong"  target 
signal  with  characteristics  similar  to  the  desired  signal  so  that,  when 
the  target  signal  is  present,  S(t)  =:  X(t),  and  when  the  target  signal  is 
absent,  N(t) ~ X(t).  Using  a "strong"  target  signal  during  alternate 
T-second  Intervals,  and  using  an  approximate  version  of  (2.34),  one 
might  hope  that  on  the  average,  will  tend  to  increase  s^/n^  ^or 
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increasing  k.  Shor  also  considers  an  algorithm  similar  to  (2.34)  with 
the  factor  (s^/n^)  removed,  and  presents  some  computer  simulation 
results. 

Lacoss  [32]  considers  a simplified  array  processor  for  which  M » 1, 

i.e.,  the  array  output  is  simply  a weighted  sum  of  the  data  at  the 

output  of  the  sensors.  Lacoss  assumes  that  x^(t)  * s(t)  + n^(t)  for 

1 ■ 1,2,...,L;  i.e.,  the  signal  component  at  the  output  of  each  sensor 

is  Identical.  Defining  R * E{N(t)N' (t) } , Lacoss  considers  the  mini- 

nn 

mization  of  W’R  W subject  to  the  constraint  that  W'l  - 1,  where  1T 
nn  l l 

is  the  L x 1 matrix  1 - (1,1, . . . ,1) ' . This  criterion  has  been  termed 

L 

"minimum  variance  distortionless  look"  because  the  output  for  such  a 
processor,  y(t),  is  given  by 


y(t)  - s(t)  + W’N(t)  , 


(2.35) 


and  the  variance  of  W'N(t)  is  minimized.  By  using  a projected 
gradient  technique,  Lacoss  shows  that  the  algorithm 


\ - - l WR„n\  • 


(2.36) 


for  k - 0,1,2,...,  converges  to  the  desired  optimal  weight  vector,  W*, 

provided  that  W^L  = 1 and  £ wk  - °°.  An  important  property  of  the 

k 

above  algorithm  is  obtained  by  noting  that 


Rxx  - E{X(t)X’(t)>  - E{s‘(t)}lLl^  + Rnn 


(2.37) 


so  that 


(I  - r l,  iT'  )R  ■ (I  - f 1T  iT'  )R 

L L L xx  L L L nn 


(2.38) 


where  it  has  been  assumed  that  E{s(t)nJl(t)}  = 0 for  all  l ■ 1,2,...,L. 

The  Importance  of  (2.38)  is  that  R may  be  replaced  by  R in 

nn  xx 
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(2.36)  without  affecting  the  convergence  properties.  Consequently,  when 

R and/or  R are  unknown,  one  may  consider  algorithms  of  the  form 
nn  xx 


"mi  ■ "k  - V1  - z Vi)  Vk  • 


(2.39) 


where  R^  Is  an  unbiased  estimate  of  Rxx,  e.g., 

Rk  - X(kT)X'(kT)  . 

With  y^  ■ X' (kT)W^,  one  might  consider  the  algorithm 

Vi ' \ - V1  - l Vi,x<kI>*k  • 


(2.40) 


(2.41) 


Note  that  y^  and  X(kT)  are  directly  available  from  the  processor,  so 
that  no  "target  signal"  Is  required.  One  problem  which  arises  in  the 
implementation  of  algorithms  such  as  (2.41)  is  that  roundoff  and  quanti- 
zation errors  can  accumulate,  enabling  to  wander  from  the  constraint 

plane. 

Frost  [33]  considers  a more  general  constraint  problem  than  Lacoss, 
with  an  added  feature  that  deviations  from  the  constraint  plane  are 
corrected  for.  Frost  considers  the  minimization  of  W'R^W  subject  to 
the  constraints  that 


i+(m-l)L 


(2.42) 


for  all  m ■ 1,2,...,M.  Frost  assumes,  as  does  Lacoss  f 32] * that 
xjl(t)  - s(t)  + n^t)  for  all  i ■ 1,2,...,L,  so  that  the  constraints 
given  by  (2.42)  imply  a constraint  oh  the  frequency  response  of  the 
array  to  any  signal  component  arriving  from  the  same  direction  as  s(t). 
In  obvious  notation,  the  constraints  given  by  (2.42)  may  be  expressed 
as  C'W  - g,  where  gf  = (g. ,g„. • • • *8U) • A projected  gradient  algorithm, 
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analogous  to  (2.36),  for  the  problem  at  hand  Is  given  by 

Vl  - Wk  - wk(I  - C(C'C)"1C')RxxWk  , (2.43) 

for  k * 0,1,2,...,  with  C 'W  • g.  Frost  [33]  adds  the  term 

o 

OiC'C)  ^(g  - C'Wk)  to  the  right  hand  side  of  (2.43)  to  correct  for 
deviations  of  V>k  from  the  constraint  plane.  Frost  proposes  the 
following  algorithm  for  the  adaptation  of  W for  unknown  R^: 

Vl  - Wk  - 11(1  - C(C,C)'1C’)X(kT)yk  + C(C’C)-1(g  - C’vy,  (2.44) 
where  yR  - W^X(kT). 

Winkler  and  Schwartz  [34]  propose  a stochastic  projected  gradient 
algorithm  for  finding  the  constrained  optimum  point  for  a concave  or 
convex  objective  function  subject  to  nonlinear  constraints.  In  [35] 
Winkler  and  Schwartz  consider  a similar  problem  by  making  use  of  penalty 
function  techniques.  Kobayashl  [36]  discusses  the  method  of  steepest 
descent  and  the  method  of  conjugate  gradients  with  projection  for  the 
iterative  design  of  an  array  processor.  Such  techniques  can  be  quite 
useful  for  the  off-line  processing  of  array  data.  It  is  noted  that  the 
adaptive  technique  proposed  by  Frost  (viz.  (2.44))  can  be  deduced  from 
the  steepest  descent  procedure  given  by  Kobayashl  [36]  in  much  the  same 
way  that  (2.44)  can  be  deduced  from  (2.43). 

Widrow  et  al.  [37]  consider  minimizing  E{(d(kT)  - y(kT))2}  with 
respect  to  W,  where  d(kT)  is  some  desired  array  output.  In  terms  of 
obvious  notation,  define 


£ (W)  - E{(dk  - yR)2}  * o2  - 2P'W  + W’R^W  , 


(2.45) 


where  o2  = E{d2},  P = E^dkXk^’  and  Rxx 


E{W 


Noting  that  the 
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gradient  of  £(W)  with  respect  to  W is  given  by  2Rxj{W  - 2P,  a 
reasonable  algorithm  for  minimizing  £(W)  is 


VT  , - W - v(R  W - P)  . 
fc+1  k xx  k 


(2.46) 


Widrow  et  at.  [37]  consider  the  following  stochastic  version  of  (2.46) 

for  use  when  R and  P are  unknown: 
xx 


Wk+1  " Wk  “ wXk(yk  ~ dk) 


(2.47) 


Noting  that  d^  is  the  only  quantity  in  (2.47)  which  is  not  directly 
available  (indeed,  it  is  d^  which  one  wishes  to  estimate),  Widrow  et  al. 
propose  the  use  of  a "pilot  signal"  having  statistical  properties  similar 
to  d^.  Suppose  g(t)  is  the  output  of  a pilot-signal  generator,  that 
g(t)  and  d(t)  have  similar  statistical  properties,  and  that 
d(t)  - s1(t  - 8j_)  = s2(t  - 02)  * ...  = sL(t  - SL).  Define 

X|(t)  = (g(t+81),  g(t+B2),...,  g(t+BL), 

g (t+8, -D) , g (t+8_-D) , . . . , g(t+8T-D)» 


g(t+B1  - (M-l)D) , g(t+B2  - (M-l)D) 


g(t  + BL  - (M  - 1)D) ) . 


(2.48) 


The  two-mode  adaptation  procedure  proposed  in  [37  ] involves  using  (2.47) 
with  d^  = 0 alternately  with  X^  = X^(kT)  and  d^  = g(kT).  The  one- 
mode adaptation  procedure  proposed  in  [37  1 makes  use  of  the  following 
algorithm: 


Wk+1  " Wk  _ y(Xk  + xi(w»<y|[  “ s(kT>>  * 


(2.49) 


where  y*  - W^(Xfc  + X^kT))  . 

Griffiths  [38]  proposed  an  algorithm  which  does  not  require  a pilot 
signal.  Assume  that  E{d(t)n. (t)}  = 0 for  all  real  t,T  and  for  all 
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t - 1,2,. ...L.  Then  P - ECd^}  - E{dk(Sk  + Nfc)}  - Etd^},  so  that  P 
Is  appropriately  called  a signal  correlation  vector,  which  Is  Independent 
of  the  noise  statistics.  Considering  that  If  enough  statistics  are 
known  to  be  able  to  generate  an  appropriate  pilot  signal  g(t),  P Is 
probably  also  known,  one  is  led  to  consider  the  following  algorithm 
proposed  by  Griffiths  [38]: 

(2-50> 

Tack  [39]  has  proposed  an  algorithm  that  is  intimately  related  to 

(2.50).  Suppose  the  weight  vector  is  to  be  trained  so  that  yk  is  an 

MMSE  estimate  of  the  additive  (nonpropagating)  sensor  noise  n^(kT). 

2 2 

If  (n^(kT)}  is  an  uncorrelated  or  "white"  sequence  with  E{n^(kT)}  = 

and  n^(kT)  is  uncorrelated  with  all  other  signal  and  noise  components, 

2 

then  the  algorithm  given  by  (2.50)  with  P'  = o (1,0,0, ... ,0)  is 

n 

appropriate.  The  resulting  array  has  been  termed  a spatial  innovations 
processor  since  the  "goal”  of  making  yk  a white  sequence  implies  that 
a cancellation  of  all  of  the  spatially  correlated  signal  and  noise  fields 
is  being  attempted.  Tack  [39]  has  shown  that  the  resulting  weight 
vector  can  be  a very  good  indicator  of  the  "bearings"  of  all  the  propa- 
gating components  of  the  signal  and  noise  fields. 

While  the  previously  discussed  array  processors  are  inherently 
time-domain  approaches,  the  next  system  to  be  discussed  lends  an  inter- 
pretation of  processing  in  "frequency-wavenumber  space."  The  following 
discussion  is  based  on  the  presentation  of  Scharf  and  Farden  [40] . In 
[40]  the  treatment  was  limited  to  a linear  (in-line)  array  of  equally 
spaced  sensors.  The  discussion  here  applies  to  more  general  array 


geometries. 
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Let  £^(t,x,y,z)  be  a real-valued  homogeneous  random  field  for 
q ■ 1,2,...,Q.  Let  t denote  time  and  (x,y,z)  denote  spatial  coordi- 
nates in  some  suitable  cartesian  coordinate  system.  Furthermore,  assume 

that  the  £ are  zero  mean  and  uncorrelated,  l.e.,  that 

n 


EUn(tl'Xl’yl’ZlKm(t2’X2‘y2’Z2)}  = ° f°r  a11 

and  for  all  n t m.  Let  ^ * 1,2,...,L,  denote  the 

spatial  coordinates  of  the  sensors.  Let 


(t)  = f Cn(t,p  ) + n 


(t) 


(2.51) 


q=l 


for  l = 1,2, ...,L,  where  the  n^t)  are  real-valued  zero  mean  wide 

sense  stationary  stochastic  processes  and  E{ni(t)n^(T) } = 0 for  all 

real  t,x  and  for  all  l<k,2.<L  such  that  k ^ X.  Suppose  that  each 

of  the  £^  corresponds  to  a propagating  plane  wave.  Then  there  exists 

a set  of  constants  (8  : 1<X<L,  l<q<Q}  such  that  £ (t,p  ) 

x,q  q X 

= £ (t-B.  , p, ) . Consequently,  (2.51)  can  be  rewritten  as 

q X ,q  1 


XX(t)  = i15q(t'Bt,q,Pl)  + V0 

q=l 


(2.52) 


for  X = 1,2, .. . ,L.  The  constants  (6.  } are  clearly  functions  of  the 

x, , q 

array  geometry,  propagation  velocities,  and  the  "directions  of  propaga- 
tion." The  relationships  of  the  constants  (8.  } to  the  concept  of 

X , q 

wavenumber  should  be  clear.  Define 


M-l  ,»  mn 

z^.kT)  » l x£((k-l  + J)T)e“3Z  M , 


(2.53) 


ra-0 


for  n - 0,1,2, ...  ,M-1,  k-1,2,...,  where  f » n/T,  i.e.,  z„(.,.) 

n x 

is  the  discrete  Fourier  transform  (DFT)  of  x^*).  Defining 
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M-l 


“ *-  a ty ®n 

Vfn’kT)  " i n4((k-l  + J)T)e"3Z1Tr  , 

m«0 


(2.54) 


from  (2.52)  z. (f  ,kT)  can  be  expressed  as 
i n 

Vfn’kT)  ' X £ £,<<k'1  + S)T  - 

m=0  q=l  n 


mn 


e_J2lI"M  + n (f  kT). 
x.  n 

(2.55) 

For  T "large  enough,"  E{z  (f  ,kT)z„  (f  ,kT)}  ~ 0 for  all  n,  + n. 

*1  n^  *2  n2  12 

and  for  all  L*  so  that  for  any  criterion  of  optimality 

involving  only  second  order  statistics,  one  can  process  the  data  inde- 
pendently for  each  f , n = 0,1,..., M-l.  The  — is  used  to  denote  complex 

n 

conjugate.  Furthermore,  for  large  T, 

M-l  Q , * q . miK 

z (f  ,kl)»  l I £ ((k-1  + ^)T,Pl)e"j2Tr(fn6t,q  M} 
m«0  q-1  q 


+ nt(fn,kT)  . 


(2.56) 


Defining 


M-l  mn 

y (f  kT)  - l £ ((k-1  +^)T.p1)e':j  *M 
M m=0  H 


(2.57) 


one  easily  obtains  that 


Q — .1 2irf  6 - 

z.(f,kT)5  l y (f  ,kT)e  “ + n. (f  .kT) 

t n q=i  *1  n * n 


(2.58) 


Def ine 


V(fn>  - (z^.kT), 

...  , z (f  ,kT))  , 

L n 

(2.59) 

»k'(f„)  - 

. . . , nT (f  9 kT) ) , 

l n 

(2.60) 

and 

32-f  e,  „ 

V(y(e  * 

,J2*f  «62„  A.,. 

6 >•••»“  / 

. (2.61) 

(2.62) 


ZAt 
k n 


)*? 


q-1 


y (f  ,kT)D  (f  ) 
q n q n 


W 


Suppose  that  a linear  MMSE  estimate  of  yj_(fn»kT)  of  the  form 
^(fn.kT)  = W'Z^(fn)  is  desired.  It  is  easily  shown  that  the  desired 
weight  vector ,W*,  is  given  by 


“*«„>  - • <2-H) 

“here  *,(*„>  - « Vfn'kT)  VV1  . 

o?(f  )D. (f  ),  and  o2(f  ) = E{|y  (f  ,kT)|2}  for  q = 1,2 Q.  It 

lnln  q n 4n 

is  of  interest  to  note  that  R (f  ) can  be  expressed  as 

z n 


(2.64) 


2 2 

where  a ( f ) * E{|n.(f  ,kT)|  }.  The  Sherman-Morrison  matrix  inversion 
q n *.  n 

lemma  [41 ] can  be  applied  Q times  to  (2.64)  to  show  that  W*(ffl) 
can  be  expressed  in  the  form 


Q 

W*(f  ) = T y D (f  ) . (2.65) 

n q=l  q q n 

2 2 

where  the  are  complicated  functions  of  °q^n^’  an<* 

pairs  of  inner  products  D'(f  )D  (f  ) [40l  • Consequently,  yn(f  ,kT) 

qnpn  in 

can  be  expressed  as 


*l(fn’kT) 


Q 

l 

q-1 


y D'(f  )Z.  (f  ) 
'q  qv  n k n 


(2.66) 


The  operation  has  the  interpretation  of  being  the  output  of 

a discrete  frequency  domain  conventional  beamformer  steered  to  provide 


a distortionless  look  at  £ . 
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Suppose  for  the  moment  that  the  D (f  ) are  known.  Defining 

q n 


D’  - (D,  (f  ) » D (f  ) D (f  ))  , 

l n i n v n 


(2.67) 


r 1 - (y^ » Y2 » • • • »Yq)  » 


(2.68) 


(2.66)  can  be  rewritten  as 


Yj (fn»kT)  - r’DZk  , 


(2.69) 


where  the  notational  dependence  of  D,T,  and  Z^  on  f has  been 
dropped.  The  operation  DZ^  can  be  interpreted  as  a spatial  DFT,  as 
discussed  in  [40] . One  may  now  pose  the  MMSE  problem  as  follows:  find 


f such  that 


e(T)  = E{|r'DZk  - yi(fn>kT)|2} 


(2.70) 


is  minimized.  Invoking  the  orthogonal  projection  theorem,  T*  is  seen 


to  be  the  solution  to 


DR  D'T  - DP.  =0 
z 1 


(2.71) 


A steepest  descent  solution  is  readily  found  as  [40] 


rfcfi  - rk  - \°<Vrk ' V • 


(2.72) 


A stochastic  version  of  (2.72)  that  can  be  Implemented  when  R is 
unknown  is 


rk+l  “ Fk  " wkD(Zkyk  " Pl*  ’ 


(2.73) 


where  y.  • Z’DT,  . In  case  the  D are  unknown,  one  may  implement 
k k k q 

several  strategies,  as  mentioned  in  [40]. 


2b 


C.  Critique 


In  this  section,  it  is  shown  that  most  of  the  algorithms  discussed 
in  Sections  II-A  and  II-B  can  be  written  in  the  form 


Vi  * \ + V'k  - Vk>  • 


where  is  a real  p x 1 matrix,  is  a sequence  of  positive 

constants,  is  a real  p x 1 random  matrix,  and  is  a p x p real 

symmetric  random  matrix.  Detailed  convergence  results  for  algorithms 
that  may  be  cast  into  the  form  of  (2.74)  are  presented  in  Chapters  III 
and  IV.  It  is  also  shown  in  Chapter  III  that  (2.74)  is  a special 
case  of  the  multidimensional  Robbins-Monro  stochastic  approximation 
procedure.  The  purpose  here  is  to  show  that  the  algorithm  given  by 
(2.74)  is  sufficiently  general  to  ensure  the  wide  applicability  of  the 
convergence  results  presented  in  Chapters  III  and  IV. 

It  is  convenient  to  start  by  considering  a rather  general  MMSE 
filtering  problem,  and  establishing  a hierarchy  of  adaptive  algorithms 
for  varying  degrees  of  a priori  statistical  ignorance  {42].  Let 
(Sk)  and  {N^}  be  jointly  wide-sense  stationary  R^- valued  (R^  is  used 
to  denote  p-dimensional  Euclidean  space)  random  processes.  Define 
+ N^,  and  assume  that  E{Nk>  = 0 and  EiS^N^}  * 0 for  all 
k,&.  Suppose  that  It  is  desired  to  estimate  some  real-valued  linear 
function  of  S^,  say  s^,  by  a linear  MMSE  estimate  of  the  form 
yfc  - W'Xj^.  Define 

£(w)  = E{(sk  - yk)2}  - E{sk)  - 2w'P  + w’R^w  , (2.75) 


where  P = E{skXk>  = E{skSk),  and  R^x  ■ EO^X^}.  It  is  assumed  that 


R is  positive  definite, 
xx 


2b 


A recursive  method  for  computing  the  w * wq  * Rxx  P that  minimizes 
£(w)  is  the  gradient  descent  algorithm: 


w,  ,,  - y,  (R  w,  - P)  , 

k+1  k k xx  k 


where  y^  > 0.  This  algorithm  provides  an  alternative  to  computing 
Wo  * Rxx  P'  Tlie  8teePe8t  descent  algorithm  is  easily  obtained  from 
(2.76)  by  choosing  y^  to  minimize  The  steepest  descent 


algorithm  is  given  by  (2.76)  with  [43] 


(R  w,  - P)’(R  w.  - P) 
xx  k xx  k 

(R  w,  - P)'R  (R  w , - P)  * 
xx  k xx  xx  k 


Note  that  by  letting  P^  * P,  * Rxx,  and  yfc  as  in  (2.77),  (2.74) 
becomes  the  steepest  descent  algorithm.  In  order  to  make  use  of 
gradient  descent  algorithms  such  as  (2.76),  Rxx  and  P must  be  known 
a priori.  Efficient  techniques  for  solving  ^xxw  “ p for  w * WQ  are 
treated  in  Chapter  V for  several  special  forms  of  Rxx> 

In  case  the  "pilot  vector,"  P,  is  known  a priori  but  Rxx  is 
unknown,  one  may  consider  stochastic  versions  of  (2.76)  such  as 

Vi  ■ “k  - - p)  • (2-78) 

Note  that  with  appropriate  Interpretations  of  y^»  X^,  and  P,  (2.78) 
is  the  algorithm  (2.50)  proposed  by  Griffiths  [38],  and  that  with 
“ p»  (2.74)  becomes  (2.78).  Furthermore, 

i k 

Fk  “ M E X.XI  (2-79) 

k M i-k-Mfl  1 1 


and  P^  ■ P in  (2.74)  is  also  a reasonable  algorithm  to  consider  in 
this  case. 
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Now,  consider  the  case  for  which  neither  R nor  P is  known 

xx 

a priori.  For  this  case,  one  may  consider  algorithms  of  the  form 


W...  " - MXiXWu  - 8 JO  • 

k+1  k k k k k k k 


(2.80) 


With  suitable  interpretations  of  p^,  X^,  s^,  (2.80)  is  the  algorithm 
(2.47)  proposed  by  Widrow  et  at.  (37],  or  algorithm  (2.22)  proposed  by 
Nlessen  and  Willim  [10] . With  and  given  by  (2.15)  and  (2.16), 

respectively,  (2.74)  becomes  the  algorithm  (2.17)  proposed  by  Gersho  [9], 
or  algorithm  (2.23)  proposed  by  Schonfeld  and  Schwartz  [13]. 

Other  algorithms,  although  not  fitting  into  the  MMSE  philosophy  or 
directly  into  the  stochastic  gradient  following  philosophy,  can,  in 
some  cases,  be  cast  into  the  form  of  (2.74).  With  = (I  - ^ 1^1^)R^, 
where  ECR^}  = Rxx  an<*  = (2.74)  becomes  the  algorithm  (2.39) 

proposed  by  Lacoss  [32].  With  F^  = (D^  - wE^)  ^F*,  P^  = (D^  - wE^)  ^P£» 
and  p^  * <*>»  (2.74)  becomes  the  algorithm  (2.26)  considered  by  Kosovych 
and  Pickholtz  [15]  . With 


Fk  = " C(C*C)-1C,(XkXi  + I) 


(2.81) 


and  P^  = C(C'C)  ^g,  (2.74)  becomes  the  algorithm  (2.44)  proposed  by 
Frost  [33].  The  algorithms  proposed  by  Lucky  [ 7 ] and  Shor  [31]  do  not 
fit  the  class  of  algorithms  given  by  (2.74). 

A simple  trick  can  be  used  to  put  complex-valued  algorithms  such  as 
(2.73)  into  the  form  of  (2.74).  Consider 


rkM  * rk  * >kVk  - P) 


(2.82) 


where  is  Hermitian  non-negative  definite.  Using  the  superscripts 

r and  i to  denote  real  and  Imaginary  parts,  respectively,  it  is 


easily  shown  that 
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k+1 


k+1 


- u. 


(2.83) 


Consequently!  with  some  obvious  definitions,  (2.82)  (and  hence  (2.73)) 
can  be  put  Into  the  form  of  (2.74),  with  real,  real  and  symmetric, 
and  real.  Furthermore,  it  is  easily  shown  that  the  resulting  real 

symmetric  F^  is  positive  definite  if  and  only  if  the  Hermltian 
is  positive  definite. 


T 


III.  EXISTING  CONVERGENCE  RESULTS 


Host  of  the  adaptive  signal  processing  algorithms  discussed  in 
Chapter  2 are  sequential  algorithms  which  can  be  written  in  the  form 


W^-W  + u (P  - FW), 
n+1  n n n n n ’ 


(3.1) 


where  E{F  ) 
n 


R and  E{P  } - P.  This  algorithm  can  be  viewed  as 
xx  n 


a stochastic  gradient-following  algorithm  or  as  a stochastic  approxi- 
mation to  the  solution,  w - wq,  of  the  equation 


R w *>  P. 

XX 


(3.2) 


This  chapter  is  devoted  to  a review  of  existing  results  on  the  conver- 
gence properties  of  algorithms  similar  to  (3.1). 


A.  Strong  Convergence  Results  for  Stochastic  Approximation 

In  1951,  Robbins  and  Monro  [ 5 ] presented  a sequential  technique 
for  estimating  the  solution,  6,  of  the  equation 


M(x)  - a. 


(3.3) 


where  M(x)  is  a monotone  real  valued  function  defined  for  all  real  x 
and  (3.3)  is  assumed  to  have  the  unique  solution  x = 0 . In  the  Robbins- 
Monro  procedure  it  is  assumed  that  the  nature  of  M(x)  is  unknown,  and 
that  corresponding  to  each  real  x is  a random  variable  Y(x)  with 
distribution  function  Pr[Y(x)  <_  y]  = H(y|x)  such  that 


M(x)  - / y dH(y | x)  . 


(3.4) 


The  procedure  starts  with  = x^  an  arbitrary  real  number  and  pro- 


ceeds via  the  recursion 


29 


30 


n+1 


X + a (a  - Y ) , 
n n n 


(3.5) 


where  Y is  a random  variable  having  the  conditional  distribution 
n 

Pr[Yn  <_  y|Xn  - xr]  - H(y|xn>,  and  {a^}  (n  >_  1)  is  a sequence  of 
positive  constants  such  that 

00  00 
I a„  * " . I "2 


n»l 


n*l 


a“  < 00 
n 


(3.6) 


It  should  be  obvious  that  (3.1)  is  a multidimensional  version  of  (3.5). 
Under  the  additional  conditions  that  M(x)  is  nondecreasing, 

> 0 , and  Y(x)  is  bounded  with  probability  one  for  all 


dM(x) 


dx 


x”9 


real  x,  Robbins  and  Monro  [5]  proved  that  lim  E{  (X  -0)2}  - 0. 

n-wo  n 

Since  the  pioneering  work  of  Robbins  and  Monro,  a great  deal  of 
work  has  been  done  on  establishing  conditions  for  which  schemes  similar 
to  (3.5)  converge.  Kiefer  and  Wolfowitz  [6]  considered  the  problem 
of  estimating  the  value  of  x = 0 such  that  M(x)  is  a maximum. 

Blum  [44]  proved  almost  sure  (a.s.)  convergence  (i.e.,  Pr[limx  =0]  ■ 1) 

in-®  n 

for  both  the  Robbins-Monro  and  the  Kiefer-Wolfowitz  procedures  under  less 
restrictive  conditions  than  those  in  [ 5 ] and  [6]*  In  1954,  Blum  [45] 
presented  multidimensional  versions  of  both  the  Robbins-Monro  and  the 
Kiefer-Wolfowitz  procedures,  and  proved  a.s.  convergence  for  each. 
Dvoretzky  [46]  presented  a general  stochastic  approximation  procedure 
which  contains  the  Robbins-Monro  and  the  Kiefer-Wolfowitz  procedures  as 
special  cases.  Dvoretzky  proved  both  mean-square  (m.s.)  and  a.s.  con- 
vergence for  his  procedure.  Wolfowitz  [47]  presented  a vastly  simpli- 
fied proof  of  Dvoretzky's  Theorem.  In  1959,  Derman  and  Sacks  [48]  gave 
a simple  proof  for  the  a.s.  convergence  of  the  multidimensional  version 


of  Dvoretsky's  procedure.  The  Interested  reader  is  referred  to  the 
excellent  review  papers  by  Schmetterer  [49,50]  and  Sakrison  [51]  for 
a more  complete  account  of  the  developments  In  stochastic  approximation. 

Essentially,  all  of  the  above  mentioned  works  contain  a common 
assumption  which,  for  our  application  to  multidimensional  adaptive  signal 
processing,  severely  limits  the  effectiveness  of  the  results.  The  assump- 
tion under  scrutiny  Is  the  following:  in  (3.5)  It  is  assumed  that  the 

conditional  distribution  of  Y given  X -x  coincides  with  the  dis- 

n n n 

tribution  of  Y(x  ) for  all  real  fixed  parameter  values  x . In  par- 
n n 

ticular,  this  assumption  implies  that  E{Y  |X  * x } ■ E{Y(x  )}  (“M(x  )). 

n n n n n 

In  terms  of  the  algorithm  (3.1),  this  would  require  that  E^FnWnPn ^ Wn"w^ 

» E{F  w - P },  for  all  fixed  (parameter)  w in  p-dimenslonal  Euclidean 
n n 

space,  RP.  That  this  is  an  unreasonable  condition  can  be  seen  by  noting 

that  W is  a rather  complicated  function  of  WY,  W_,  ...,  W . as 
n -1.^  n— i 

well  as  P1,  P2,  ....  Pn_L,  and  F^  F2>  ....  F^;  and  that,  in  general, 

oo 

{F^J^i  is  a correlated  sequence.  Loosely  speaking,  given  the  value 

that  the  random  vector  W takes  on,  one  is  also  given  some  "information" 

n 

about  what  values  the  random  matrix  F is  allowed  (and  possibly  also 

n 

p^) . It  is  noted  that  several  papers  state  an  alternate  assumption 

which  is  similar  to  that  above:  the  conditional  distribution  of 

given  X,“x,,...,X  »x  coincides  with  the  distribution  of  Y(x  ) for 
linn  n 

all  real  fixed  parameter  values  x^.  It  is  also  noted  that  several 
stocl  itic  approximation  convergence  theorems  require  a weaker  condi- 
tion with  the  work  "distribution"  above  replaced  by  "expectation."  In 
practice,  such  conditions  essentially  require  that  either  (Yn)  is  an 
independent  sequence  for  the  distribution  condition  or  an  uncorrelated 
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sequence  for  the  expectation  condition.  Clearly,  such  conditions  severely 
limit  the  applicability  of  the  results.  Surprislugly , very  little  has 
been  done  to  alleviate  the  restriction  due  to  this  assumption.  The 
results  of  Derman  and  Sacks  [48]  will  now  be  discussed  to  suggest  a 
possible  approach  for  obtaining  a more  applicable  result,  as  well  as 
to  show  how  the  algorithm  (3.1)  is  related  to  the  general  stochastic 
approximation  procedure  of  Dvoretzky. 

Derman  and  Sacks  [48]  have  provided  a simple  proof  to  the 
following  multidimensional  version  of  a theorem  originally  stated  by 
Dvoretzky  [46] . The  absolute  value  signs  are  to  be  Interpreted  as  the 
p-dimensional  Euclidean  length. 


THEOREM.  Let  {X  },  {T  (X-,...3X  )},  and  {Y  (X^...,X  )}(n>l)  be 
n n l n n l n — 

p-dimensional  random  vectors  with  X ^ arbitrary  and 


Xn+1  s Tn(Xl*  ' ’ %yXn  + * 


(3.7) 


Assume  that 

ElYn \Xlt...tXn}  a=8'0  , 


and 


l E{\Y |2>  < » , 
n=l  71 


\Tn\  < maarfo^  (l+&n) |xj  - yj  , 


(3.8) 

(3.9) 

(3.10) 


where  (a  },  {6  },  and  {y  } are  sequences  of  positive  numbers  such 
n n n 


that 


a -*■  0,  y b < °° , y y = * 

n nil  n nil  n 


Then  \X  | a+8’  0. 
1 n' 


Making  use  of  a technique  suggested  by  Dvoretzky  [46] , algorithm 
(3.1)  can  be  written  so  that  the  above  theorem  can  be  applied.  Defining 


V - W - w , 
n n o 


(3.12) 


(3.1)  can  be  written  as 


V , ■ (I  - u F )V  + y C , 
n+1  n n n n n 


(3.13) 


where 


C - P - F w 
n n no 


(3.14) 


Now,  defining 


Y (V  ) - u((R  - F)V  + C ) 
n n n xx  n n n 


(3.15) 


T (V  ) « V - U R V, 
n n n n xx  n 


(3.16) 


(3.13)  becomes 


V » T (V  ) + Y (V  ) 
n+1  n n n n 


(3.17) 


Define  the  matrix  norm  of  a pxp  matrix  A by 


I |a| I - SUP  I Aq | , 

kill 


(3.18) 


which  for  A real  and  symmetric  yields 


| |A| | - max  | A. (A) | , 
i 


(3.19) 


where  (*^(A)}^m^  are  the  p eigenvalues  of  A and  q is  a p-element 


column  vector. 
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Let  (p  } , and  (n  }*  . be  nonincreasing  sequences  of  positive 

n n-1  n n— 1 


numbers  such  that  p -*-0,  n -*•(),  7 p 

n n , n 

n-1 


*•  and  l Vn  “ 

n-1 


Since  R is  assumed  to  be  positive  definite,  there  exists  an  n 

XX  r » o 

such  that  for  all  n>n  , p < X * <00>  where  X - min  X (R  ) , 

— o n min  * min  ^ i xx 

For  all  n ^n  , and  for  all  u e R*5,  |T  (u)  | - |u  - p R u|  |u|. 

— o n n xx 

•I1  - Rxxl>  - *ul(1  ‘ Wn  W*  F°r  311  |u|-V  • “I  ^n  W 

1 M " un  Xmin  nn;  whereaS’  for  N < V lu|(l-Un  X^)  < r>n  . 

(1-p  X . ).  It  follows  that  for  all  n > n , 
n min  — o 

|Tn(u)l  lmax(nn(l-Pn  *min) , |u|  - PR  X^  nn)  , (3.20) 

so  that,  with  Xn  - Vn,  on  = nn(l  - Pn  Xmin)  , 3n  - 0 , and  Yn  = 

p X , n , (3.7),  (3.10),  and  (3.11)  are  satisfied.  The  following 

n min  n 

corollary  has  thus  been  established. 


COROLLARY.  Let  V , C , Y > T be  p -dimens ional  random  vectors  given 
nnnn 
00 

by  (3.12)  to  (3.17).  Let  (Pn>n_j  be  a nonincreasing  sequence  of  posi- 

00 

tive  numbers  with  p 0 and  £ p = <».  Assume  that  (3.8)  and  (3.9) 

a. 8.  n=1 

are  satisfied.  Then  | F | -+  0. 

The  difficulty  with  the  above  corollary  is,  of  course,  the  establishment 

of  (3.8)  and  (3.9).  Condition  (3.9)  can  be  deleted  by  requiring  that 

E{ | p ^"Y  (u)|2}  be  uniformly  bounded  for  all  n > 1 and  for  all  u e R^, 
n n 

" 2 

and  that  \ p < 00  . This  uniformly  bounded  condition  will  be  dis- 
n-1  n 

cussed  in  more  detail  later.  Condition  (3.8)  has  the  same  limitation 


mentioned  previously.  Dvoretzky  [46]  shows  that  (3.8)  may  be  replaced  by 


sup 


(3.21) 
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n=l  x, , . . . ,x 
1 n 


| E{Y  |x  , . . . ,x  } | < 
n1  1 n ' 


or  by  the  condition  that  each  element  of 


EE{YJXl****,Xn} 

n=l 


(3.22) 


be  uniformly  bounded  and  convergent  for  all  sequences  x^.x^. . • . .x^, . . , 
Unfortunately,  conditions  such  as  (3.21)  or  (3.22)  are  extremely  dif- 
ficult to  verify  in  practice. 

The  method  of  proof  of  Derman  and  Sacks  [48]  can  be  modified  to 


obtain  yet  another  alternative  to  (3.8).  Let  P = P 


n n»x. »x. » . • • ,x 
12  n 


be 


random  orthogonal  transformations  such  that  P T * (It  1,0,...,  0)'  and 

n n n 


define  Z = P T , where  Z'  = (Z  , ,Z  . . . ,Z  ).  If 
n n n n nl  n2  np 


m Z 1 (1+B  ) ‘ 

l nl  " 


. 2a  + B 
n=l  n n 


and 


m Z2, (1+B  )4 

I ^ 


n=l  (2a  +6  ) 
n a 


converge  a.s.  to  random  variables  as  m -*■  °>,  then  condition  (3.8)  of  the 
theorem  can  be  deleted  and  the  theorem  remains  true.  Although  this  may 
suggest  a reasonable  approach,  the  establishment  of  these  conditons 
appears  difficult,  even  for  the  special  case  considered  in  the  corollary. 
Also,  condition  (3.9)  or  its  (stronger)  alternative  of  uniform  bounded- 
ness of  E{|u  H (u) | 2}  is  somewhat  restrictive.  In  any  case,  this 
n n 

approach  will  not  be  pursued  here. 

Sakrlson  [52]  presents  a continuous  Kief er-Wolfowitz  procedure 
and  proves  mean-square  convergence  for  an  a.s.  bounded  process  and  a 


1 
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requirement  on  the  rate  at  which  the  minimum  mean-square  prediction 
error  approaches  its  asymptotic  value.  Sakrison  (51]  suggests  that  this 
condition  is  applicable  to  the  Robbins-Monro  procedure.  More  recently, 
some  convergence  results  for  algorithms  of  the  form  of  (3.1)  with 
= u = constant  have  appeared. 


B.  Weaker  Convergence  Results  for  Stochastic  Approximation 


Daniell  [53]  investigates  a kind  of  mean-square  convergence  for 


algorithms  similar  to  (3.1)  with  u = u = constant.  In  fact,  letting 

n 

CO 

(X  } , be  a sequence  of  p-dimensional  random  vectors,  u = U,  and 

n n=l  n 

F = X X' , (3.1)  is  precisely  the  algorithm  considered  by  Daniell. 
n n n 

Rewriting  (3.1)  in  the  lorm  of  (3.13),  with  C given  by  (3.14), 

n 

Daniell  [53]  proves  the  following  theorem.  The  trace  of  a matrix  A is 
denoted  by  tr{A). 


THEOREM.  Define  A.  = X .X'.  - R . Suppose  that  (i)  there  exists  a 

CC2C 

sequence  of  positive  numbers  { converging  to  zero  such  that  for 
every  pair  of  positive  integers  k and  l 
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A/iu.r 


\xi*(:r--',xi-i‘ri-i]  < no  ; 


(a.  20 


(Hi)  there  exists  a sequence  of  positive  real  constants  (o  } 

K. 

converging  to  zero  such  that  if  for  all  integer  i > 1 and  for  all 
integer  k,  L%  M satisfying  i < i+k  <_  M <_  L, 


tr{E{ALAM  | X1,C1,...,Xi,Ci}  - E(A L A^)  < ; 


and  (iv)  there  exists  a positive  constant  B such  that 


E{\c.\2)  < B2 


Ei \X^4  \ci\2}  < B2 


(3.27) 


(3.  28) 


(3.29) 


Then  for  all  6 > 0 there  exists  a \i*  > 0 such  that  for  all 

0 < ii  < p*  there  exists  a positive  integer  k^(6)  such  that  for  all 

k > k (6) 
p 


E{\V.\2}  < 6 


(3.30) 


The  kind  of  convergence  obtained  by  Daniell  is  clearly  weaker  than 
mean-square  convergence;  however,  by  replacing  u with  a nonincreasing 
sequence  of  positive  constants  converging  to  zero,  it  seems 

reasonable  to  conclude  that  the  proof  could  be  modified  to  obtain  mean- 
square  convergence.  For  applications  which  require  the  algorithm  to 
track  slowly  time  varying  parameters,  a fixed  step  size  seems  to  be  a 
reasonable  as  well  as  a widely  used  technique. 

Senne  [54]  performed  a simulation  study  of  an  algorithm  similar  to 
that  treated  by  Daniell,  and  noted  that  when  the  process  {X^}  is 
correlated,  a bias  is  introduced  which  Increases  with  step  size,  p.  An 


analytical  justification  for  this  can  be  obtained  by  taking  the 
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expectation  of  both  sides  of  (3.1)  to  obtain 


E{W  . } = E{W  } + g (P-E{ F W }) 
n+1  n n n n 


(3.31) 


Suppose  that  E{W  } *=  w and  that  p *•  p.  If  (F  } is  a correlated 
no  n n 

sequence,  then  F and  W are  also  correlated  so  that  E{F  W } j*  P, 
n n n n 

and  hence  E{Wn+^}  ^ WQ*  From  this  simple  argument,  it  should  be 

concluded  that  in  order  to  have  any  hope  for  the  algorithm  to  even  be 

asymptotically  unbiased,  the  condition  that  p 0 is  essential.  It 

n 

is  interesting  to  note  that  if  (pn>  Is  a sequence  of  positive  constants 

converging  to  zero  and  the  variance  of  each  element  in  the  correction 

vector  p (P  -F  W ) is  decreasing  with  increasing  n so  that  the  vari- 
n n n n 

ance  of  each  element  of  W will  also  be  decreasing,  F and  U will 

n n n 

"decorrelate." 

The  main  issue  here  is  to  determine  the  limitations  of  the 

assumptions  made  in  Danlell's  theorem,  i.e.,  to  determine  the  types  of 

correlated  processes  (X  } for  which  the  theorem  is  applicable.  Daniell 

n 

[55]  provides  several  examples  of  processes  which  satisfy  the  conditions 

of  the  above  theorem;  however,  for  the  "correlated  cases'  considered,  it 

is  assumed  that  the  process  {X^}  is  bounded.  Conditions  (3.25)  and 

(3.26)  indicate  that  this  bounded  assumption  is  essential  for  the 

application  of  the  above  theorem. 

Kim  and  Davisson  [56]  treat  another  algorithm  which  fits  into  the 

framework  of  (3.1).  Let  {s  ) and  (x  } be  jointly  stationary 

n n 

M-dependent  scalar  stochastic  processes.  A sequence  of  random  variables 


{y  } is  said  to  be  M-dependent  if  for  all  index  sets  I , J , with 
n n m 

min  |n-m|  > M,  the  two  sets  of  random  variables  {y  :nel  } and 

nel  ,mcJ  n n 

n m 

{y  :meJ  } are  statistically  independent.  Define 
m m 


J lJ 


X'  «•  (x  ,x 
n 


»A  i » • • • 

n n-l  n-p-1 

(n+l)K-l 


),  let  P_ 


1 


(n+l)K-l 


l 8 X , 

“ m m 


7 XX',  and  y ■ u 
" K m=nK  mm 


F 

yields 


K m-nK 
constant.  Substituting  into  (3.1) 


Vl  - wn  + (y/K) 


(n+l)K-l 

l X_(s_-OTW_) 
m**nK 


m m m n 


(3.32) 


Kim  and  Davisson  [56]  show,  under  the  above  assumptions,  that 
2 

E{|Wn  - wq|  } can  be  made  arbitrarily  small  for  n large  enough  by 
choosing  y small  enough  and  K large  enough.  Although  not  explicitly 
stated  by  Kim  and  Davisson  [56],  their  analysis  also  requires  the  exis- 
tence of  all  fourth-order  moments  for  both  {s  } and  {x  }.  The 

n n 

results  of  Kim  and  Davisson  given  above  can  likely  be  modified  by  re- 
placing p with  a nonincreasing  sequence  {y  } of  positive  constants 

n 

converging  to  zero  to  obtain  mean-square  convergence. 

Schmetterer  [50]  presents  the  following  theorem,  a result  which  is 
quite  similar  in  nature  to  the  results  discussed  above  of  Daniell,  and 
Kim  and  Davisson. 


THEOREM.  Let  a be  a sequence  of  positive  veal  numbers , satisfying 


n 


V a.  = <■».  Let  x and  y be  p-dimensional  random  vectors  such  that 
. , i n °n 

v=l 

x = x -ay  SS) 

n+1  n rrn 


for  every  n >_1.  Furthermore , for  every  n >_lt  let  M^( ' ) be  a 
Borel  measurable  mapping  from  Lp  to  tP . Assume  that 
ff{| yn  - Mn(xn)  |2}  exists  for  every  n >_1,  and  that  there  exists  a real 
C > 0 such  that 


JL 
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tH  \y  - A 1 (x  )\“]  < C , n > 1 
'nn  n n ' — - 


(a.  34) 


Furthermore,  suppose  that  there  exists  a K > 0 which  satisfies 


ai<K  > 


(3. 35) 


such  that  for  every  n >_  1 and  xefP,  the  inequality 

\x  - a M (x)\  < (1  - Ka  ) |x| 
nn  — ** 


(3. 36) 


2 2 

holds.  If  E{\xj\  > exists,  then  E{\x^\  } exists  for  every  n >_2. 


Furthermore, 


n-1 


(E{\x  \2})1/2  + ((E{  \x  \2})1/2  - C1/2K~2)  n (1-Ka.)  . (3.37) 

Tl  i 

V=1 


It  follows  that 

Tim  (E{|x  \2})1/2  < 1 


n-x» 


n' 


(3.38) 


Although  condition  (3.34)  of  the  above  theorem  severely  limits  its 
applicability,  some  comments  on  the  above  theorem  are  in  order.  First 
of  all,  note  that  no  conditional  expectation  or  conditional  distribution 
restriction  is  made.  Secondly,  (3.37)  gives  a bound  on  the  mean  norm- 
squared  error  for  all  n >.  1.  Hence,  if  the  above  theorem  could  be 
applied  in  a practical  situation,  it  would  be  quite  useful.  Noting  that 
(3.13)  can  be  written  as 


V . . = V - y (F  V -C  ) , 

n+1  n n n n n 


(3.39) 


with  C = P - F w , and  substituting  x = V , a = u , y ■ F V - C , 


n o 


n n n n 
P 


n n 


(3.33)  results.  Letting  M (v)  = R v for  all  veR  , (3.34)  requires 

n xx 


that  there  exist  a C > 0 such  that 


, 
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E{  I F V - C - R V } < C , n > 1 . 

' n n n xx  n‘  — — 


(3.40) 


Since  R^x  is  assumed  to  be  positive  definite  with  minimum  eigenvalue 

X min , K * Vin  an^  (3.35)  establish  (3.36).  Apparently,  (3.40)  is 

difficult  to  establish  unless  V is  uniformly  bounded  (In  n)  with 

n 

probability  one,  thus  suggesting  a possible  application  to  truncated 
algorithms.  That  is,  suppose  wq  is  known  a priori  to  lie  within 
some  closed  convex  parameter  space  P , and  consider  the  following 
truncated  version  of  (3.1) 


W„+1  * fW„  + lJn(Pn‘FnWn)10  » 

n+l  n n n n n p 


(3.41) 


where  [xj^  “ x if  xeP,  and  [x]p  *s  boundary  point  of  p closest 

to  x if  x/P.  Defining  p = {x:x+w  eP},  and  with  V ■ W - w , 
r w o n n o 

o 

(3.41)  becomes 

Vi 


[V  -n  (FV  + F w - P )]„ 
n nnn  no  nP 


w 


(3.42) 


Clearly,  this  algorithm  is  a.s.  uniformly  bounded,  and  can  be  shown  to 
satisfy  (3.40).  Unfortunately,  certain  analytic  difficulties  arise  when 
attempting  to  establish  (3.36)  for  this  algorithm.  A result  similar  to 
the  above  theorem  of  Schmetterer  for  algorithms  such  as  (3.42)  would  be 
highly  desirable. 

C.  Critique 

In  this  chapter,  several  of  the  existing  convergence  results 

applicable  to  algorithms  having  the  form  of  (3.1)  have  been  reviewed  in 

detail.  Several  suggestions  have  been  made  as  to  how  existing  results 

might  be  modified  to  obtain  reasonable  conditions  for  which  W -*■  w in 

no 


4 1 


some  meaningful  probabilistic  sense  when  the  sequence  {F  } (and 

n 

possibly  also  {P^})  correlated. 

In  summarizing  the  state  of  existing  stochastic  approximation 
results,  it  can  be  said  that  the  conditions  Imposed  by  Robbins  and  Monro, 

Dvoretzky,  and  Derman  and  Sacks,  for  example,  employ  Ingenious  mathema- 
tical constructs  to  permit  general  applicability  of  stochastic  approxi- 
mation results.  From  a practical  point  of  view,  however,  it  cannot  be 
emphasized  too  strongly  that  their  conditions  are  easily  established  for 

oo  n 

(3.1)  only  when  (P  - F w}  , is  an  independent  sequence  of  R - 

u n n-i 

I 

valued  random  variables, where  w is  a fixed  parameter.  Consequently 
the  existing  results  are  not  well-suited  to  the  analysis  of  structures 
that  must  be  adapted  in  correlated  environments.  As  repeatedly  mentioned 
previously,  the  restrictive  assumptions  are  the  "conditional  distribution," 
or  the  "conditional  expectation"  assumptions.  The  only  results  (known 
to  the  author)  not  making  these  restrictions  are  those  of  Daniell,  Kim 
and  Davisson,  and  Schmetterer,  mentioned  in  Section  III-B.  In  the  next 
chapter,  easily  verified  conditions  will  be  established  for  which 
as  given  by  (3.1)  will  converge  a.s.  to  wq.  These  conditions  will 
permit  us  to  relax  the  "conditional  expectation"  or  "conditional  distri- 
bution" assumptions  of  existing  theorems  and  prove  convergence  in  cor- 
related environments  of  practical  interest. 


IV.  NEW  CONVERGENCE  RESULTS 


In  this  chapter,  new, easily  verified  conditions  are  established 

which  ensure  the  a.s.  convergence  of  W to  w as  given  by  (3.1). 

n o 

Section  IV-A  contains  the  main  results  of  this  dissertation.  The  proof 
of  the  theorem  relies  heavily  on  the  techniques  presented  by  Albert  and 
Gardner  [57].  The  proof  of  the  practically  useful  result.  Corollary  2, 
makes  strong  use  of  the  results  of  Serf ling  ([58]  and  [59]).  In  Section 
IV-B  the  results  of  Section  IV-A  are  applied  to  the  specific  algorithms 
treated  in  Chapter  II,  providing  analytical  justification  for  existing 
and  proposed  applications  of  these  algorithms.  In  Section  IV-C,  a 
highly  specialized  form  of  (3.1)  is  treated  which  seemingly  suggests  a 
"maximum  convergence  rate"  for  certain  algorithms.  Open  issues 
regarding  the  convergence  properties  of  algorithms  fitting  the  framework 
of  (3.1)  are  discussed  in  Section  IV-D. 


A.  Almost  Sure  Convergence  Results 

As  shown  in  Chapter  III,  the  algorithm 


W . , * W + p(P  - FW)  , 
n+1  n n n n n 


can  be  written  in  the  form 


where 


(4.1) 


(4.2) 

(4.3) 

(4.4) 


C - P 
n n 


9 


(4.5) 


f 


E{P  } - P . 
n 


(4.7) 


It  la  assumed  that  R^y  Is  a real  symmetric  positive  definite  p x p 

matrix,  W and  P are  elements  of  R**,  (u  } is  a nonincreasing 
n n n 

sequence  of  positive  constants,  and  that  is  a random  sequence 

of  real  symmetric  non-negative  definite  p x p matrices.  Defining 
for  {A^}  a sequence  of  p x p matrices 


n a * * * A*+i  At  ’ if  k - 1 • 

1 -1  1 l I,  if  k < l; 


(4.8) 


and  iterating  (4 .2),  one  obtains 


n n 


Vi  • n (I  “ \Fk)vi  + l ( n (i  - ^Fi))  ^kck 

k-l  K K 1 k-1  j-fcfl  J J * * 


(4.9) 


Defining 


(4.10) 


An  * J \ Qk+l,n  WkCk  * 
k=l 


(4.11) 


(4.9)  becomes 


V. . - Q,  V + A 
n+1  In  1 n * 


(4.12) 


Recall  that  the  matrix  norm  for  a p x p matrix  A is  defined  by 


||A||  = sup  | Ax | , x eR  , 

M 1 i 


(4.13) 


which,  for  A real  and  symmetric  coincides  with 
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|(  A(t  - max{  | X 4 (A)  | } , 

1 e(l,2, . . . ,p> 


(4.14) 


where  (A^(A) are  the  p eigenvalues  of  A.  Denote  the  minimum 

and  maximum  eigenvalues  of  A by  X . (A)  and  A (A),  respectively. 

min  max 

With  the  above  notations  and  definitions  ((4.1)  - (4.14))  established, 
which  will  be  assumed  throughout  the  remainder  of  this  section,  the 
main  result  of  this  dissertation  can  now  be  stated. 


THEOREM.  Suppose  that  the  following  assumptions  (including  the 
structure  implied  by  (4.1)  - (4.14))  are  satisfied: 

Al)  (uk)  is  a nonincreasing  sequence  of  positive  constants 
converging  to  zero  such  that  whenever 

00 

|k-*|  < N,  Vy/vl  < hN  < »,  and  l = <=°, 


A2) 

»*IM  “*■ 

0 as  k -*■ 

A3) 

, ” 

» l Fk 

k=l  K 

a + ’ R as  n -*■  00  , 

XX 

A4) 

there  exists 

a random  vector  S c fP  such  that 

n 

sn  ~ 1 ukCk 
” k=l  K 

S as  n and 

AS) 

IVs  - Sn-1> 

| -+  0 as  n <*. 

\V 
1 n 

| a-t-  0 as 

n ■*  ®. 

Regarding  assumptions  Al  through  A5,  assumption  Al  is  seemingly 
the  only  assumption  similar  in  spirit  to  other  stochastic  approximation 
results,  and  is  easily  satisfied  by  **  1/n,  for  example. 


• 


Assumption  A3  Is  the  only  other  readily  recognized  assumption,  and 
can  be  Interpreted  as  a kind  of  ergodlclty  assumption.  Indeed, 
assumptions  A2  through  A5  Involve  the  a.s.  convergence  of  sequences 
of  random  variables,  and  the  conclusion  of  the  theorem  Is  the  a.s. 
convergence  of  still  another  sequence  of  . andom  variables.  The  prin- 
cipal advantage  In  using  such  an  approach  is  that  assumptions  A3  through 
A5  are  in  a form  suitable  for  (but  not  limited  to)  application  of  the 
results  of  Serfling  ([58]  and  [59]).  The  end  result  Is  sufficient 
conditions  on  the  "decay  rate"  of  the  autocovariance  functions  of  the 
sequences  { F^}  and  {P^}  which  imply  A3  through  A5.  Examples  in 
which  these  results  are  applied  to  the  algorithms  discussed  in 
Chapter  II  are  given  in  Section  IV-B. 

As  mentioned  previously,  the  proof  of  the  above  theorem  relies 
heavily  on  the  techniques  of  Albert  and  Gardner  [57].  The  proof  is  a 
direct  modification  of  the  proof  of  Theorem  6.3  of  [57];  however,  the 
algorithm  treated  in  Theorem  6.3  of  [57]  is  quite  different  from  (3.1) 
and  the  assumptions  above  are  seemingly  less  restrictive.  Before 
proving  the  theorem,  several  useful  lemmas  will  be  established.  Lemmas  1 
and  2,  which  are  similar  in  nature  to  Theorem  6.1  of  [57],  make  use  of 
assumptions  Al,  A2,  and  A3  to  show  that  llQinll  0 as  n -*■  *.  The 

assumption  that  each  F is  symmetric  and  non-negative  definite  can 

n 

be  relaxed  by  applying  Theorem  6.1  of  [57];  however,  for  adaptive  signal 

processing  applications,  F is  almost  always  some  form  of  a sample 

n 

covariance  estimate,  hence,  the  simplification  resulting  for  symmetric 

F seems  worthwhile, 
n 

LEMMA  1.  If  A1-A3  are  satisfied,  then  there  exists  a sequence  of 
integers  {v^}  with  2®v^<v^<v^<. . . such  that,  with 


Pk  - Vi  - V 


4/ 


Jk  * {VV] VrJ1*  * = (i)  Pmin  <pk<1?max<”> 

(ii)  p 71  X.  ( l F .)  = a-  a’>8'  0 , fiii;  P:J  X r I F.)=yva-<8’  Y<« 

K mtn  . **T  j ac  max  .Lro  k 

JcJk  jeJk 

and  (iv)  there  exists  a & > 0 such  that  a‘£‘  6.  The  sequences 
(v^},  {p^} j {a^}.,  and  {Yfc>  way  aZZ  fee  random  sequences  depending  on 
the  particular  realization  of  the  sequence  {F^}. 

PR(X)F.  Define 


Kl  = —9  l F.  . 
n k-t+1  k 


Let  e > 0 be  given  such  that  ®<e<^min^xx^  • Assumptions  A1-A3  imply 

that  for  any  fixed  £e{0,l,2, . . • },  lim  R*"  a=s*  r . it  follows  that 

n-+«  n xx 

£ fl  , S 

lim  X . (R  ) ■ ' X . (R  ).  Hence,  it  follows  that  there  exists  an  n. 

n+®  min  n min  xx  i 

(possibly  random)  such  that  l^min^Sac^  ” \nin^Rn  ^ a<s’  E-  thus 

, £ 

0 < X . (R  )-e  a*8*X  . (Rfc  ).  Since  n.  is  finite  and  £.  is  arbitrary, 
min  xx  minv  n^  £ 

(i),  (ii),  and  (iv)  follow.  A similar  argument  applies  to  (iii).  Q.E.D. 


LEMMA  2.  If  Al3  A23  and  A3  are  satisfied 3 then  l|Sjn||  G’-^’  0 as 


PROOF.  It  follows  from  A2  that  there  exists  a random  variable 
M,  1<M<“  such  that  sjip||l- 
in  Lemma  1,  for  any  n,  let  K =*  K(n)  be  the  largest  integer  such  that 


F ||  < M.  Keeping  the  same  notation  as 


vR  <_  n so  that  vR<n_<\)K+^-l . 


Then 
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||,,.„11  ' ^ ll'  - "jVM'Vv'1'  " llQi.v1"' 

(4.17) 

Consequently,  it  suffices  to  show  that  ||Q1  . ||  a4-8*  o as  K 00 

’V 

with  n over  some  subset  of  the  positive  integers.  Noting  that 

-1 


K-l  K+l 


K-l 

Q1  v -1  * 11  11  (I  " " 11  % v -1 

,VK  k=l  J=vk  ^ ^ k-l  k’  k+l  ’ 


(4.18) 


and  defining  T = Q , 

vk’  k+l 


W1 


K-l 

n r 

k-l 


k ’ 


(4.19) 


Expressing  as 


Vk+1-1 

rk  - » « - »tV  - I - I V F + I W „ F F 

A-vk  jeJk  4 : >l2  1 2 1 4 

WJk 


+ l 


l (-Dq  y_  y,  ...y,  F F ...F,  , (4.20) 


0 ’'f  •••T'P  *p  *p  ••’*p 

q-3  Jl,  >£„>...>«,  *1  *2  q *1  *2  q 

12  q 

» • • • » ^ td, 
i Z q K 


it  follows  that  (for  p < 1) 

Vk 


l|r*" a- ■ 1 - »wi  w i,  v + X «'  V 


k+l 


JeJt 


vk  Jed. 


* 1 - y ,P,  a,  + y l (P  Y) 

— v,  ..-l  k k v,  max 

k+l  k q-2 


(4.21) 


from  Lemma  1.  From  A1  and  Lemma  1,  there  exists  a positive  Integer 

k (possible  random)  such  that  for  all  k > k , 
o — o 


, . — — - — - 
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lirkll  “iS *'  1 - | « ‘i8'  “p{‘i"vk+1-lP.in  S1, 

(4.22) 

—x 

since  1 - x £ e for  all  real  x.  Hence,  there  exists  a random 


variable  M,  such  that  for  all  K > k , 
1 o 


K-l 


l«!  , .,11  «,  n || E || 

1,VK  1 1 k-k  1 

o 


K-l 


V-  «p(-ip„ln  « l » k)  • 


(4.23) 


k-k  k+1 
o 

It  follows  from  the  above  and  A1  that  |lQ^nll  °*  Q.E.D. 

LEMMA  3.  (Albert  and  Gardner)[57] . Let  {A^}  be  a sequence  of 
square  matrices.  Then  for  all  J<k<n  and  n>lt 


n n n 

l [ n (I  - A J]A  . = J - n (I  - A.). 
d=k  i=3+l  * 3 i=k  Z 


(4. 24) 


LEMMA  4.  (Toeplitz  Lemma)[60] . If  x ^ + C,  cmd  the  coefficients 

satisfy  (i)  for  fixed  p >_  1,  a 0 as  n -*•  (ii)  there  exists 

* n 

a K such  that  for  all  n >_1,  ][  lan^l  < Ky  cm^L 

n n 

(Hi)  l a . = A -*■  a as  n + then  x*  = 7 a jc  . a£  . (4.25) 

m n n m i 


i=l 


i=l 


PROOF  of  THEOREM.  Equation  (4.12)  expresses  V . , as 

n+l 

V ■ Q,  V + A . It  has  been  shown  In  Lemma  2 that  ||Q,  II  a4-8,  0. 

n+1  In  1 n "In" 

It  remains  to  be  shown  that  |A  | a-*S’  0.  From  (4.11)  and  A4,  with 

n 

S *0  and  Q ■*  I, 

o n+1 ,n 


Using  Che  same  notation  as  in  Lemmas  1 and  2 


with  K - K(n)  the  largest  integer  such  that  v < n so  that 


v <n<v. 


which  can  be  bounded  as 


where  d,  is  defined  by 


so  that  it  now  remains 


It  f 0II0W8  from  A5  that  d 


only  to  show  that  for 


0 as  K -*■  «®  with  n over  some  subset  of  the  positive  integers 


6,  from  Lemma  2 there  exists  a k such 


Defining  6. 


It  is  assumed  that  k is 


large  enough  so  that  8.  < 1 for  all  k>  k . Proceeding*  for  all 


kQ-l  K-l  ko-l  R1  R_ 

\±  l n ||r  ||  °n  ||r  ||  p d + j n ||r  ||  u d 

k-i  i-k  i- k+i  * vk  k k-k  e-k+i  1 \ k 

O O 


k K-l 

a<s*  m ° n (i 


K-l  K-l 


*i>  l \ dk+  l n <1-8n)8k(pv  Vk  > 

k-l  vk  K k-k  Jl=k+1  Z k vk  k k 


Define 


(4.35) 


a . = n (1  - ft  )B  . 
ni  *-i+l  1 1 


(4.36) 


Clearly,  for  all  fixed  i > k , a . -*■  0 as  n 

— o ni 

From  Lemma  3, 


n n 


l lanll  = I n (i  - b.)b.  - i - n (i  - b.),  (4.37) 

i-k  i=k  t-i+1  * 1 i-k  1 


which  converges  a.s.  to  1 as  n ->•  °°,  so  that  by  Lemma  4, 


Hm  . . -l.a.&lim  , , — 1 

K*»  aK-l,k(Pvkdk6k  } " k-x»  (yvkdk8k  ) 
o 


(4.38) 


From  A1  and  the  definition  of  g 


2yv  dk 

V*1  ■ , — 7T— s < \ • 

k vk+l-1  min  max 


(4.39) 


and  hence,  from  A5, 


^<“vaC)  » 

k 


(4.40) 

Q.E.D. 


With  the  theorem  established,  considerable  attention  will  now  be 
given  to  the  establishment  of  corollaries  which  will  guarantee  under 
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extremely  realistic  conditions,  that  assumptions  A1  through  A5  are 
satisfied.  It  will  be  expedient  to  make  use  of  the  order  notation, 
<?(•),  e.g.,  f(n)  = 0(g(n))  if  f(n)/g(n)  is  bounded  as  n -*■  <*>. 

A worthwhile  simplification  of  assumptions  A1  through  A5  results 
in  the  case  ||F  ||  *s  a-s.  bounded.  In  this  case,  the  following 
corollary  is  easily  established. 

COROLLARY  1.  If  = 0(k~ 2),  ^ > 0,  and  ||Ffe||  is  a.s. 

bounded,  then  Al,  A 2 , and  A 5 may  be  deleted  and  the  theorem  remains 
tme. 

PROOF.  It  suffices  to  consider  *■  k Assumption  Al  is  trivially 
satisfied.  That  A4  implies  A5  can  easily  be  seen  by  noting  that  there 
exists  an  M a*8,  « such  that  |F  (S  - S . ) | < ||  F ||  • | S - S J a*S* 
M»|S  - Sn_jJ  , so  that  |S  - a4-s"  0 implies  that 

| Fn(S  - ^)  | a4-8*  0 as  n -*•  ®.  Assumption  A2  is  easily  established 

by  the  Borel-Cantelli  Lemma  and  the  Chebychev  inequality  as  follows. 

For  all  e > 0,  PrtpJlFj  > t)  = Pr{||Fk||  > ey"1}  < ^ e'2  E{||Fk||2}, 
and  since  k 2 is  summable,  b.llf.ll  a4S*  0 as  k -»■  «.  Q.E.D. 

K K 

The  Borel-Cantelli  Lemma,  together  with  probabilistic  bounds,  such 
as  the  Chebychev  inequality,  the  Markov  inequality,  or  the  Chernoff 
bound,  provides  a frequently  used  technique  for  establishing  the  a.s. 
convergence  of  sequences  of  random  variables.  Unfortunately,  the 
available  probabilistic  bounds  often  approach  zero  but  are  not  summable 
(unlike  the  case  presented  in  Corollary  1).  The  work  of  Serf ling  ([58] 
and  [59  ])  provides  useful  techniques  by  which  the  above  difficulties  can 
be  overcome.  For  a more  complete  treatment  on  a.s.  convergence,  the 
interested  reader  is  referred  to  the  recent  text  by  Stout  [ 6ll * Before 
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developing  the  machinery  necessary  for  the  proof  of  Corollary  2,  the 

a.s.  convergence  of  Is  - S , I Is  discussed  In  order  to  Illustrate 

n-i 

the  concepts  Involved. 

For  all  e > 0,  the  following  bound  Is  easily  obtained  from  the 
Chebychev  inequality: 

Pr(|S-Sn ^1  > e}  < e"2  Efls-S^l2}  , (4.41) 

2 

where  it  has  been  assumed  that  E{|S  - S , I } < ®.  It  is  noted  that 

1 n-11 

S - Sn_^  is  given  by  (formally) 


S - S . - 7 u.  C.  , 

n-1  , L k k ’ 

k-n 


(4.42) 


so  that 


1 ■ l l vWi1 

k«n  £«n 


(4.43) 


,-l* 


Suppose  for  the  moment  that  E{C'C^}  - 6^  ^ , and  that  y^  ■ 0(k  ), 


where  6.  is  the  Kronecker  delta  function 

K|  X 


{1,  if  k * £ 
o,  if  k f £ 


(4.44) 


2 ”1 

Then  E{ | S — Sn_^|  ) ° 0(n  ),  which  is  seemingly  the  fastest  rate  one 


can  expect,  so  that  it  is  indeed  fruitless  to  attempt  the  direct  appli- 


cation of  the  Borel-Cantelli  Lemma  to  (4.41)  to  obtain  the  a.s.  conver- 


2 

gence  of  |s  - Sn_^| . However,  while  the  summability  of  E{ f S — _^|  } 


seems  impossible,  it  would  seem  reasonable  to  require  that 


E{  | S - S^  jJ  } -*■  0 as  n -*•  ».  Although  mean-square  convergence  and 


a.s.  convergence  are  not  equivalent,  in  view  of  A4  it  does  not  seem 


unduly  restrictive  to  require  that  m4-8,  g. 
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an 


*y 

Suppose  that  E{|s  - S^_jJ  } -*•  0 as  n ■*  °».  Then  there  exists 
Increasing  subsequence  {n^}  such  that  n^  ■+■  » as  k -*■  <*>  and 


l E<|S  * S |Z>  < - , 

k-1  "k  * 


hence  |s  - S a4-8*  0 as  k -*■  •». 

"k 


(A. 45) 


This  fact  can  be  used  by  noting  that  for  all  n e Lfc,  with 

“ {nk»V1 vr1}» 

Hfc+l"1 

lS  - Sn-ll  ' I l WiCi  + < l ^iCi« 

i“nk+l 


i~n 


“k+l-1 

1 “ax  | l u C | + |S  - S | 

lel^  W \+l 


(4.46) 


For  all  sequences  (n^)  satisfying  (4.45)  and  such  that 

"k+r1 

max  | l w -c. | a±8’  0 (4.47) 

leLk  ** 

as  k -*•  <»,  |s  - Sn_^|  a-V8’  o as  n -*■  »>.  The  work  of  Serf  ling  ([58] 
and  [59])  is  easily  applied  to  terms  like  (4.47) 

The  following  lemma,  a multidimensional  version  of  Theorem  A of 
fed , will  be  shown  to  be  Invaluable  for  the  establishment  of  conditions 
similar  to  (4.47).  The  proof  of  Lemma  5 is  a simple  modification  of 
that  given  in  [58]  and  will  be  omitted. 


LEMMA  5.  (Serf ling)  [58].  Let  {x.} 

u 

c F?  having  finite  "variances" 

For  each  matrix  X = (x  ....... x 

a,n  a+1  a+n 


he  a sequence  of  random  vectors 
= E{(x.  - E{x .}) ' (x . - E{x.})} 

V V V V 

) of  n consecutive  x . ' 8 
4 v 
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let  n denote  the  joint  distribution  function  and  let 


atn 

S = 7 x . 3 

a>n  iA+i ' 


(4.48) 


M =max{\S  | },  and  let  g(F  ) be  a functional 

Uj  i CLyTl  CL^rl 

depending  on  F . Let  a be  an  arbitrary  but  fixed  integer  and 

CLy  W O 

let  V > 2.  Suppose  g(Fn  k)  + g(*a+y  %)  t,g(Fn  k^)  for  all  a >_  a 


a,  k+t 


and  l<k<k+l  such  that  £{|S  |v}  < g^(F  ) for  all  a > a and 

1 a3n'  — 3 a3n  * — o 

all  n>l.  Then  E{MV  } < ( log _ 2n)vg^(F  ) for  all  a > a and 

~ a3n  — a a3n  — o 

all  n>l. 


A rather  straightforward  modification  of  Lemma  5 will  also  be 
needed  and  is  presented  below  as  Lemma  6.  The  proof  of  Lemma  6 is 
virtually  identical  to  that  of  Lemma  5 and  thus  will  be  omitted. 


LEMMA  6.  Let  [x .}  be  a sequence  of  random  vectors , x.  e F? 

2 

having  finite  "variances"  o.  = E{  (x . - E{x .})' (x . - E{x.})}.  For  each 

t i z v x 

matrix  X = (x  ,^.....x)  of  n consecutive  x.’s  let  F 

a3n  a-n+1*  ‘a  z a3n 

denote  the  joint  distribution  function  and  let 


a 

l *,•  * 


(4.49) 


asn.u.,z 

z=a-n+l 

M = max{\S  |>  and  let  g(F  ) be  a functional 

CLjYi  dj  i a,n  cl  yYi 

depending  on  F a n-  Let  aQ  be  an  arbitrary  but  fixed  integer  and 
let  v >2.  Suppose  g(Fa  + Q(?a_y  ^ 1 $^a3  k+i)  ^or 

l<k<k+Z<a-a  such  that  ff{|S  |v}  < o*sv (F  ) for  all  l<n<a-a  . 

o a3n'  — ° a3n  — o 

Then  ElM g } < Clog0  2n)'>g9* (F  ) for  all  l<n<a-a  . 

a3n  — a3n  — o 

Lemma  7 below  makes  use  of  common  procedures  to  obtain  bounds  on 


double  sums  of  symmetric  functions,  such  as  autocorrelation  functions. 


A 


V 
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The  results  of  Lemma  7 will  prove  Invaluable  In  establishing  "slowest 


decay  rates"  of  the  autocovariance  functions  of  (F,  } and  (P.  } for 

k k 


"gain  sequences"  {yfc}  of  the  form  • 0(k-1).  The  technique  of 


proof  will  allow  some  flexibility  regarding  the  choice  of  sequence  {p^}. 


LEMMA  7.  Let  0^  = 0^  and  p(k,t)  = p (l,k)  be  peal  valued 
functions  defined  for  all  non-negative  integers  k,  t.  Then  for  l<n<m, 
define 


* k*=n  l=n  * 


(4.  SO) 


Then 


m-n  m-u 


m 


(a)  \m~  2 l I ak  k+u  e(k*kni)  + l av  v <>(k>k)’ 

u=l  k*n  *•***  k=n  k>k 


Suppose  further  that  there  exists  a real  valued  function  f(u)  such 
that  for  all  u = 0,1,2,...,  and  for  all  k * 1,2,..., 

\p(k,k+u)\  ±f(u ),  and  f(u)  = 0(u~v).  If  fL  = 1,  then,  for  large 
m-n  and  v = 1, 


(b)  ly  I = 0((m  - n)ln(m  - n)). 
' ' n,m ' 


-1 


Finally,  if  * 0(k  ),  and  v > 1,  then 


M .1  - Otn-^l. 


PROOF.  Let  u * k - i In  (4.50).  For  u * n-m,n+l-m, . . . ,-l;  k ■ n, 
n+l,...,u+m.  For  u » 0;  k ■ n,n+l,...,m.  For  u * 1,2,..., m-n; 
k ■ u+n,u+n+l,  ...,m.  Substituting  Into  (4.50), 


5 6 


-1  m+u  m 

V.  " J J “k,k-u  0<k-k-a)  + J “k,k  0<k-k> 

* u*n-m  k-n  * k«n 


m-n  n 


+ l l \ k-u  p(k*k_u)  * 

u-1  k-n+u  K,K  u 


(A. 51) 


Making  the  transformation  k*  ■ k - u in  the  last  series  and  making 
use  of  the  symmetry  relations,  (a)  follows. 

Suppose  that  a ■ 1 and  |p(k,k+u)|  £ f(u)  * 0( u for  all 

K 9 JC 

u - 0,1,2,...,  and  for  all  k - 1 >2 Then 


|y  | £ 2 £ f (u) (m-u-n+1)  + (m-n+1)  f(o). 


For  all  l<i<m-n,  for  some  C.  > 0,  and  for  C.  - max  f(u), 

l<u<m-n 


m-n  1 

Y | < 2C,(m-nH  + 2C  (m-n+1)  l j- 
n’B  ~ 2 ^ u-£+l  U 

-2C^(m-n-l)  + (m-n+1)  f(o)  , 


which,  for  some  > 0,  yields 


(4.52) 


(4.53) 


lYn  J < 2C2(m-n)t  + 2<^ (m-n+1  Hn(^)  + 2(^1  + (m-n)C3,  (4.54) 


since 


n-n  - m-n  , _ 

l i<  I *-*n(ST1) 


u — ; x 
u*i+l  i 


(4.55) 


It  follows  that  |y  I - 0( (m-n) in (m-n))  for  large  m-n  by  letting 
n,m 

i,  - in  (m-n)  . 

Suppose  now  that  “k  t * MkWi’  Pk  * °^k_1^»  and  that  there  exists 
an  f(u)  • 0(u-v)  with  the  desired  properties.  Then*since  it  suffices 
to  consider  only  uk  m k , 
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m-n  m-u  . m . 

i 2 I f<">  i kokr + £<o).l  ti 

u-1  k-n  ' 7 k-n  k 


For  all  l<u<m-n  (n>2) , 


V 1 < mfU  dx  . I . ((m-u) (n+u-1). 

c«n  k(k+u)  — n_i  x(x+u)  u m(n-l)  ' 


Similarly,  for  2£n<m, 


y i < 7 **  _L__  A 

kin  7 - n-1  x2  "-1  m 


For  all  l<u<_i<n<m, 


ffnf(m~U)(n+tI~l)  < fnfn+£~1N 
4n(  m(n-l)  - *n<  n-1  } ' 


for  all  l<i<u<n<m-n. 


i » * > 


and  for  all  n<u<m-n. 


2°(^ru>  i - • 


Hence,  for  all  l£i<n-2<m-n-2,  with  = max  f(u), 

| T | i 2C1  i tn(S±^i)  + 2 in  2 °f  A f <») 

u*i+l 


m-n 

+ 2 l f (u) 
u“n 


&n(u) 

u 


+ £<»>(^r- 


Since  f(u)  ■ 0( u v),  there  exists  a > 0 and  an 
all  l > to,  f (u)  < C2u“v, 


o 


such  that 


(4.56) 

(4.57) 

(4.58) 

(4.59) 

(4.60) 

(4.61) 

4.62) 

for 


so  that 
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^ “ 


n-1 


I l 


I V.1  i 2C1  1 to(J£r>  + 2C2  ta  2 L 

u-i+1  u 

+ 2C2T  ^ + £<°>  <5^1  - ;>  • 


(4.63) 


Thus,  for  some  fixed  C^,C^,C^, 

l V1 1 2ci  * In(i!Sii)  + c3(,)’v  + C4<”-I)'v  + c5 


(4.64) 


Q 

Substituting  l **  n , 6 > 0,  in  (4.64),  and  using  the  fact  that 
4n(l  + x)  <_  x for  all  x > -1,  one  obtains 

26-1  A -6v, 


|y  | * 0(2n  B + n“BV).  Q.E.D. 
n,°° 


(4.65) 

k~l  ^ i i ^-v/v+2^ 

Enough  machinery  has  now  been  developed  to  prove  the  following 
useful  corollary. 


Finally,  if  6 - (v  + 2),  then  \y  I * 0(n*v/v+Z). 

n,°° 


COROLLARY  2.  Define 

Pa(ktD  = MCJCJ 


and 


0F(ktl)  = \\EVjfJ  - R2J\ 


(4. 66) 

(4.67) 


Suppose  that  there  exists  a real-valued  function  f(u) 
such  that 


max  { \pQ(k,k+u)  | , pp(k,kHi)}  < f(u) 


0(u~v)  (v>l) 


(4.68) 


for  all  positive  integer  k and  for  all  non-negative  integer  u. 
Furthermore,  suppose  that  \ij,  » 0(k~ *),  kv^  > 0,  and 


-1 


(q  > 2v  (v  + 2))  is  bounded.  Then  Assumptions  A1  through  AS  of 


the  theorem  are  satisfied  and  hence , |7  | aXB‘  o as  n ■*■  *. 
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PROOF.  First,  consider  assumption  A3.  Define  S by 

3)11 


a+n 

S - I (F.  - R )v, 
a’n  k-a+1  k xx 


(4.69) 


where  w e R . Clearly,  assumption  A3  is  satisfied  If  and  only  If 

“1  sl  s P 

n |S  | -*■  * 0 as  n -*■  «•  for  all  w e R and  for  all  a « 1,2,.... 

3)11 

Define  M -max{|s  |,...,|s  |}.  Let  {n.}  be  an  Increasing 

3 )tl  3)1  3yll  K 

sequence  of  positive  Integers  such  that  n^  -*•  00  as  k -*■  ».  For  all 


n_1|  S 


-1 


,n  Sa,n^-1  + "k  ^+^-1,0^-^  . 


(4.70) 


Clearly, 


l _ a . n j 

k 1 xx 


a+n  a+n 

E{ I S |Z}  - T y w'E{F  F - Rz  }w 
a,n  k-a+1  i-a+1 

a+n  a+n 


< |w|  y y p (k,4)  - 0( n In  n)  , (4.71) 


k-a+1  t=a+l 


2 -2 


from  Lemma  7.  Letting  il  - kfc,  n*  E{|S  | } is  summable  from  (4.71) 

k k 3,1^ 

The  Chebychev  inequality  and  the  Borel-Cantelli  Lemma  thus  imply  that 
n.^|S  Ja48*0  as  k ■+•  «*.  With  g(F  ) - E{  | S |^},  Lemma  5 and 

K 3)1L  "1  3)11  3)(l 


(4.71)  easily  yield  EtnT^l*  , } - 0((*n  k/k)3),  which 

*k  a+n^-1 , nk+1~n^t 


is  sum- 


mable . Hence 


* St^a+n.-!, 


a.s. 


v-via 


4-  0 as  k » so  that,  by  (4.69), 


n 3|S  I a4-8*  0 as  n -*■  •»  and  A3  is  satisfied. 

1 a,n' 

Now  consider  A5.  Let  {n^}  be  an  increasing  sequence  of  positive 
integers  such  that  -*•  « as  k ->  » and  let  L^  - (n^.n^+1, . ..  .n^^-l} 

For  all  n e L^, 


IVs  - Wl  ■ lFn<  ln  >JlCi  + S - SW1>I 


nk+l~1 

i IIPJKI  + Is  - »Vi.!l) 


< nTe  max  ||P,||n?  max  | j U,C  | 

i e L.  «-  c L.  i-i 

k k 


where  as  yet,  $ > 0 is  arbitrary. 
Def ining 


S - I V,C  , 
,n  i-a-n+1  1 1 


and  M = max{|S  1|,...,[S  (},  (4.72)  becomes 

3 yll  3.  f 3)11 


|Fn(S  * Sn-l)1  lnk  jlm^l|FJ|(nk  \+1-l. 


I . + nf  I S - S , 

nk+l_1’nk4-l"nk  * "fcfl"1 


Since 


E{'Sa  n!2}  **  ^ I Wc  (1,J) 

a»“  4_„ Li  4 LI  1 J ^ 


i-a-n+1  j-a-n+1 


with  g (F  ) — E{ | S I },  Lemma  6 applies  so  that 
a,n  a,n 


vr^ViA1 1 a°8’  2<vr%»2  8<\+1-i.wv 


oi 


From  (c)  of  Lemma  7,  g(F  , ) ■ 0(ol',^V+^).  From  (4.43)  and 

nk+r1,nk+rnk  16 

Lemma  7,  E(|s  - _jJ2}  = °^nic+l+2^  • If  and  6 > 0 can  be 

OO  00  _ 

chosen  such  that  (i)  £ k < ®,  (ii)  £ n.^  E{|S  - S . | } < °»,  and 

k»l  k=l  K "k+l 

00 

(ill)  7 n,2^  E{M2  . } < then  the  Markov  inequality,  the 

k-i*  Vi^'ViA 

Borel-Cantelli  Lemma,  and  (4.74)  will  show  that  | F (S  — S )|  a4-8"  0 

n n-i 

ct  —1  _2. 

as  n -*■  <*>.  It  is  easily  verified  that  for  n^  = k , q <8<v(2v+4)  , 

and  a > (v+2) (v-28(v+2))_1,  (i),  (ii),  and  (iii)  are  satisfied. 

Finally,  ufc  - 0(k_1),  E{||Fn||q}  bounded,  and  ^ kpk  > 0 imply  A1 
and  A2;  while  (ii)  and  (iii)  imply  A4.  Q.E.D. 

B.  Application  of  Corollary  2 

In  this  section,  the  results  of  the  previous  section  are  applied 
to  the  algorithms  discussed  in  Chapter  II.  In  order  to  apply  Corollary 
2,  it  Is  necessary  to  establish  asymptotic  decay  rates  on  Pc(k,S.), 
and  Pp(k,£),  as  defined  by  (4.66)  and  (4.67).  Define  pp(k,Jl)  * 

| E{pkpj)  - P'P|.  From  (4.66),  (4.4),  and  (4.5), 

|pc(k,i)|  - |E{C-Ca>| 

■ lE<p;V  - Wk>  + BtFkp»l)  + “o(Vt»«ol 


< |E{P'Pr)  - P'P|  + |P'P  - w;E{FtPk}| 


♦ ip-p  - »;«pkp,>i + i»oi2  n«w  - rL» 
<pp<k.«)  + l»„ll«tap-«p,Vl 

+ l»0l-|R)(Xp  - EfF),pt'l  + l»0l2pF(k,t)  • (4'77) 


Hence,  by  defining 
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PFp(k,*) 


|R  P - E{F.  P. } | , 
1 xx  k £ ' 


(4.78) 


Pc(k,£)  can  be  bounded  as 

|pc(k,£)|  < pp(k,£)  + |wolpFp(k,£)  + lw0IPFp(*»k> 


+ |wj  pp(k,£)  . 


(4.79) 


With  (4.79)  established,  it  is  easily  seen  that  in  order  to  establish 

decay  rates  on  p (k,  ),  and  p (k,£),  it  is  sufficient  to  consider 
C r 

pp(k,£),  ppp(k,£),  and  pF(k,£).  Before  treating  specific  examples, 

expressions  for  p , p and  p will  be  developed  which  are 
r rr  r 

sufficiently  general  to  cover  most  of  the  algorithms  treated  in 
Chapter  II. 


Let 


a”d  tMjJj 


be  sequences  of  R^-valued  zero- 

be  a sequence  of  real-valued 
zero-mean  random  variables.  It  is  assumed  that  E(X^  ” E{N^  Nj^} 


mean  random  variables,  and  let  (s^J^ 


and  E{s,  N,  . } = 0 for  all  Integers  k and  u.  The  ijth  element  of 
k k+u 

a matrix  A will  be  denoted  by  (A).  . • It  is  assumed  that  all  fourth- 

J 

order  moments  correspond  to  stationarity;  e.g.,  Els^s  £+i(X£+j)m1(X£+k)m2} 
is  independent  of  £.  Define 

EtVW  • 


R (u) 

XX 


and 


V">  - "'iV 


Ps(u)  - E(.ksk+u} 


(4.80) 

(4.81) 

(4.82) 


Consistent  with  the  notation  used  previously,  define  Rxx(°)  * Rxx* 


and  P (0)  - P. 


6b 


The  following  definitions  for  P,  and  P,  will  be  sufficient  for 

k k 

the  purposes  of  the  present  analysis.  Define 


pk  - -A 


(4.83) 


Fk  " \K 


(4.84) 


Clearly,  E{P^}  = P and  E{Fk>  ■ Rxx»  so  that  (4.6)  and  (4.7)  are  satis- 
fied. Most  of  the  algorithms  which  have  been  proposed  for  use  in 


adaptive  signal  processing  use  » X^X^  and  either  * s,  X^  or 

Pfc  * P - Ets^X^}.  In  case  Pk  - P,  pp(k,i)  r 0 and  ppp(k,i.)  = 0,  so 

that  in  this  case  one  need  only  consider  pp(k,£). 

First  consider  p (k,i).  From  (4.84), 
r 


E(VW  ■ «WWW 


(4.85) 


In  case  is  a multivariate  Gaussian  Random  Process  (GRP) , it  is 

easily  shown  that  from  (4.80), 

Efwww  - eL  + rL<u>  + R»<»>tr<E«(")>-  <4-88> 

by  recalling  that  if  Yi»Y2,Y3’  an<1  Y4  are  3oint^y  normally  distri- 
buted zero-mean  random  variables,  then  E{YjY2Y3Y4}  ■ EtY^Y^JEtY^Y^} 

+ EtY^jEtY^}  + E{Y1Y4)E{Y2Y3).  In  general,  define  ic^u)  such 


“ViVV1  - 


R2  + R2  (u)  + R (u)tr(R  (u))  + it,  (u), 

XX  XX  XX  XX  1 


(4.87) 


so  that 


w'vw  - kL  • rLm  + R„*<u>tr<Ex*<u»  + <4-88> 


06 


Next  consider  pp(k,fc).  From  (4.83), 


rtP£W  ■ "'V'iWW  • «-8’> 

In  case  8ic»x^»sic+u*  and  are  jointly  normal,  then 

■ P'P  + Ps(»)«(Rxx(u))  * p;(»)P8(u).  (4.90) 

In  general,  define  k2(u)  such  that 

"■kSWW  ■ P'P  + P8<u)tr«xx(u»  + P;(u)Ps(-u)  + «2(«). 

(4.91) 

Then  pp(k,k+u)  can  be  determined  from 

E{PkPk+U}  “ P'P  “ P8<U)ti(Rxx(u))  + P8(U)P8(_U)  + K2(u)* 

(4.92) 

It  is  important  to  reiterate  that  in  case  Pfc  - P,  then  pp(k,k+u)  = 0. 
Finally,  consider  PypCk.i).  From  (4.83)  and  (4.84), 

■ MVkWW  ■ 

Proceeding  as  before,  (4.93)  can  be  expressed  as 

"VtWW  ■ V + P, <-»>«<*„<»»  + «**<“>%<->  + k3<“>- 

(4.94) 

where  iCj(u)  = 0 in  the  normal  case.  Hence,  ppp(k,k+u)  can  be 
determined  from  (4.78)  and 

"Vw,1  - V * %<-“)«(«„(»))  + Rxx<u)P8(-u>  + k3(u)  . 


(4.95) 


Again,  in  case  Pfe  - P ■ E^X^},  then  ppp(k,Jl)  = 0. 

A useful  fact  for  the  application  of  the  above  results  is  that 


for  a pxp  matrix  A, 


6/ 


P P 


2 -v  1/2 


llAll  i Id  (A)f  .) 
j-1  i-1  1,3 


(4.96) 


i.e.,  the  norm  of  A is  bounded  by  the  sum  of  the  Euclidean  lengths  of 
its  columns  (or  rows),  as  shown,  e.g.,  by  Rudin  [62].  Define 


g (u)  - max  |(R  (u)).  ,| 

l<i,j<p  xx  1,2 


(4.97) 


and 


g (u)  = max  |(P  (u))  | ; 

2 l<m<p  8 m 


(4.98) 


then,  from  (4.96),  II Rxx(u)  ||  < p3/2gx(u).  From  (4.67),  (4.88),  and 
(4.97), 


Pp(k,k+u)  <_  p3  g2(u)  + ||  k1  (u)  || 


(4.99) 


From  (4.92),  (4.97)  and  (4.98), 


Pp(k,k+u)  <_  p* |pg(u) | gx (u)  + p-g2(u)  + | <2 (u) | . (4.100) 


From  (4.78),  (4.95),  (4.97),  and  (4.98), 


PFp(k»k+u)  5.  P3^2g2(u)8i(u)  + P2gx(u)g2(u)  + k3(u)| • (4.101) 


Now,  from  (4.79),  (4.100),  (4.101),  and  (4.99), 


|pc(k,k+u)|  £ P'lPgWlg^u)  + 2p2|wQ|g1(u)g2(u) 


+ PJ|wo|2g2(u)  + p g2(u)  + I wQ|  ^ ||ic1(u)|| 


+ pk,(u)  I + 2 1 w I • |k  (u)  I , 


(4.102) 


by  noting  that  g^(u)  and  g2(u)  are  even  functions.  Finally,  by 
defining 


g(u)  = max  (gx(u),  g 2 (u) , |pg(u)|)  , 


■—  — 


(4.103) 
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there  exist  positive  constants  and  a^  such  that 

max  (pFCk,k+u),  |pc(k,k+u)|) 

£ a1g2(u)  + a^k-j^Cu)  ||  + a3|x2(u)|  + ajic3(u)|  . (4.104) 

THE  NORMAL  CASE 

In  case  {X^},  {s^}  are  GRP's,  then  ||  ic^  Cu)  ||  = |k2(u)| 

= |x3(u)|  = 0,  so  that 

max  (pp(k,k+u) , |p^(k,k+u) | ) £.  a^g2(u)  . (4.105) 

Also,  in  the  normal  case,  E{||Fn||2}  is  bounded.  Hence,  all  algorithms 
of  the  form  of  (3.1),  with  F^  * X^X^,  Pk  = sk^k  °r  ^k  * ^ * ®^sk*k^* 

and  p,  * 0 (k  ^),  lim  kp  > 0,  satisfy  the  hypotheses  of  Corollary  2 
* k-H»  k 

and  hence,  converge  almost  surely  provided  that  g(u)  in  (4.105)  is 
-1/2 

0(u  ).  This  result  suggests  that  essentially  all  one  needs  to  do  to 

establish  a.s.  convergence  for  this  class  of  algorithms  in  the  normal 

case  is  to  ensure  that  all  scalar  correlation  functions  y(u)  which 

1/2 

can  be  computed  for  {s  },  (X  },  satisfy  lim  u |y(u)|  < °°. 

J j u 

EXAMPLE  1 

Let  (n(t):  -»  < t < °°)  and  (s(t):  -»  < t < <*>}  be  zero  mean 

jointly  wide-sense  stationary  finite  variance  Gaussian  random  processes. 
Define  x(t)  * n(t)  + s(t),  and  assume  that  E{s(t)n(t  + t)}  = 0 for 
all  t,x.  Define  the  "data  vector"  X'(t)  = (x(t),  x(t-D),  ...  , 
x(t-(p-l)D) ) . Suppose  that  it  is  desired  to  form  a linear  MMSE  estimate 
of  s(t  + o)  at  t * kT,  k = 0,1,2,  ...  , based  on  the  "data  vector" 

X^  * X^lt“kT’  w^ere  D is  an  integer  multiple  of  T.  Denoting 
s(t  + a) I t-kT  ^y  sk*  *s  easllx  s^own  hat  the  desired  linear  MMSE 
estimate  of  s^  is  given  by  3^  = wjX^,  where  is  the  (assumed 


7 
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unique)  solution  of  Rxxw  “ p»  Rxx  * E{XfcX£},  an<*  P = E^8kXk^*  Defining 
Yx<t)  » E{x(t)x(t+r) } , y n (t ) = E{n(t)n(t+T)},  y8<t^  = E{s(t)s(t+r)}, 

Rxx(u)  “ E{\K+n}  3nd  Ps(u)  “ EfskXk+u^’  At  iS  easily  seen  that 

«*,<“»!,  j -*„(»*+  (i-DW 

- i <uT  + (i-j)D)  + t (<lT  + (t-j)D)  , (4.106) 

s n 

and 

P (u))  = y (uT  - a - (m-l)D) . (4.107) 

sms 

Define  S (f)  = F {y  (t)},  S (f)  = F {y  (t)}  to  be  the  spectral  densities 
s s n n 

for  the  processes  s(t)  and  n(t),  respectively.  Suppose  the  signal 
spectral  density  is  the  rational  density. 


Ss(f)  = 72 


f + b. 


(4.108) 


and  the  noise  spectral  density  is  the  ideal  lowpass  density. 


S (f)  = 
n 


b3  , I f | < B 
0 , |f|  > B 


where  b^b^,  and  b^  are  positive  constants.  Then 

, , b2  -2irb1 1 r | , 

.(t)  * r-  re  1 

bl 


V 


and 


^(T) 


2b3B 


sin  2ttBt 
2itBt 


(4.109) 


(4.110) 


(4.111) 


It  is  easily  seen  that  for  this  example,  g(u)  defined  by  (4.103)  is 
0(u-*) . Suppose  that  P is  known  and  consider  the  algorithm 

Vi  ’ "x  + s<p ' WV-  <4-112) 

for  k >_  1,  with  Wj  arbitrary.  Clearly,  all  of  the  assumptions  of 


I ; 
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Corollary  2 are  satisfied  and  hence,  a-*-s*  wq  as  k -*■  ■».  Now, 
suppose  that  P Is  unknown  but  s^  Is  available.  Then  the  algorithm 


Vi ' Hk + k<*A  - Wk> 


(4.113) 


will  converge  a.s.  to  wq.  It  Is  easily  shown  that  algorithms  such 


X 1 ^ 

w.  ^ =*  w,  + i(p  - i I x0rflw. ) 

k+!  k k K lmktK+1  l l k 


(4.114) 


Vi-"k  + i5t.J)[+1<»A-)£i?W 


(4.115) 


will  also  converge  a.s.  to  wq  for  any  finite  positive  integer  K. 


The  above  example  shows  the  ease  with  which  the  assumptions  of 
Corollary  2 can  be  established  for  a rather  large  family  of  algorithms 
in  the  normal  case.  A straightforward  extension  of  Example  1 to  arbi- 
trary rational  spectral  densities  yields  identical  conclusions;  i.e., 
if  n(t)  and/or  s(t)  in  Example  1 are  finite-order  autoregressive 
moving  average  processes,  the  conclusions  remain  unchanged.  Extensions 
of  Example  1 to  the  adaptive  array  processing  of  homogeneous  random 
fields  is  straightforward,  but  notationally  somewhat  cumbersome. 

The  application  of  Corollary  2 to  the  non-normal  case  is,  in 
general,  more  difficult  than  Example  1 suggests  for  the  normal  case. 

Two  possible  approaches  for  the  non-normal  case  are  as  follows: 

(i)  compute  bounds  on  Pc(k,i.)  and  pF(k,i)  either  directly  or 
via  (4.78)  and  (4.79)  and  apply  Corollary  2 directly,  or  (ii)  compute 
bounds  on  the  fourth  cumulant  functions  k^(u)  .^(u) » and  Kj(u), 
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apply  (4.104)  and  then  apply  Corollary  2.  Example  2 below  conaldere 
a rather  special  case  of  the  former  approach.  An  additional  difficulty 


arises  In  the  non-normal  case  in  establishing  that  E{||fQ||^)  Is 


bounded . 


EXAMPLE  2 

Let  {s^}  m be  Independent,  zero  mean,  finite  variance, 

vlde-sense  stationary  stochastic  processes.  Assume  that  both 


and  {s  )k-_^  are  M-dependent.  Recall  the  definition  of  M-dependence 


' * ' ’Vp+l^ 


from  Chapter  III.  Define  ■ s^  + n^,  and  » ^xk,Xk-l 
Define  = X^X^  an<*  a88ume  that  E{  || F^H  **} , q>2,  is  bounded.  Suppose 


that  it  is  desired  to  form  a linear  MMSE  estimate  of  s.  based  on  the 

k 


data  vector  X^.  The  desired  estimate  is  easily  shown  to  be  * Wo*k' 


where  w is  the  (assumed  unique)  solution  to  R w - P,  R * E{X.X/}, 

O XX  XX  K iC 


and  P = E{skXk>.  It  is  easily  seen  that  ||pp(k,k+u)| 


||E{XkXi;Xk+uX^+u}  - R^||  - 0 for  all  u > for  some  ^ > M. 


Similarly,  Ppp(k,k+u),  and  Pp(k,k+u)  are  easily  shown  to  be  zero 


for  all  |u|  > M2  (for  some  M^  > M)  for  either  = s^X^  or 


-1 


P^  = P *>  E{skXk>.  Letting  * k , all  of  the  assumptions  of 


Corollary  2 have  been  established.  It  is  not  difficult  to  show  that 
algorithms  such  as  (4.114)  and  (4.115)  will  also  converge  a.s.  to  w 


A slight  generalization  of  the  result  summarized  by  (4.104)  and 
(4.105)  seems  to  be  useful  for  algorithms  having  the  form  of  (3.1)  with 


P * — T 

k * L 


*Sc  i-k-Kj+1 


siXi 


(4.116) 


1 


* -—****• 
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and 


\mT 


k i«k- 


(4.117) 


where  Is  a positive  integer-valued  function  of  k.  Clearly, 

E{P^}  » P and  EtF^}  * Rxx*  so  that  (4.6)  and  (4.7)  are  satisfied.  In 
case  * 1.  (4.116)  and  (4.117)  reduce  to  (4.83)  and  (4.84),  respec- 
tively. Denoting  the  right-hand  side  of  (4.104)  by  h(u),  (4.104) 
can  be  restated  for  the  case  at  hand  as 


k-l 


max(pv(k,k+u) , |pr(k,k+u)|)  £ o v „ E £ h(n  “ m>  * (4.118) 

m n 

(4.118) 

where  u * an<*  the  sums  are  over  the  index  values 

k-K^+l^m^k  and  k + u - + l<^n£k  + u.  The  techniques 

used  in  Lemma  7 can  be  applied  to  the  double  sum  appearing  in  (4.118) 

to  obtain  , 

U+Kk_1 

(4.119) 


“k,u  2 l ■ °k,„  £ *<”>  6k,u,. 

m n ’ v-u-lC.  +1 


where  ^ u v * min(o,u-v)  - max(-K^,  u-v-K^^).  In  case  - K 
(a  constant),  then 


. u+K-1 

ak,  H h(n-ra)  - £ h(v)(K  - |v-u|)  . (4.120) 

* m n K v-u-K+1 


Another  special  case  of  interest  is  » k;  then 


o.  7 7 h(n-m) 
k,u  L L 
m n 


b 

where,  by  convention,  £ * 0 

a 


min(u,o) 


k(k+u) 


\u,o/  . u 

l h(v)(k+v)  + I 


v-l-k 


h(v) 


v-1 


+ r l h<v>+kofco  E h(v)  (u-v+k) , 
* v-u+1  naxfu+1.1) 


u+k-1 


max(u+l,l) 


(4.121) 


if  b < a. 


The  result  (4.121)  can  Indeed  be  used  to  examine  the  convergence 


properties  of  algorithms  having  the  form  of  (3.1)  with 


This  resulting  algorithm  seems  to  be  of  Interest  for  several  reasons 


and  is  treated  from  an  alternative  viewpoint  in  detail  in  the 
following  section. 


C.  A Simple  a.s.  Convergence  Result 


In  this  section,  a simple  a.s.  convergence  result  Is  established 


which  does  not  require  all  of  the  machinery  developed  in  Section  IV-A 


The  result  is  of  interest  because  of  its  simplicity  and  the  lnfor 


mation  provided  on  the  convergence  rate  for  algorithms  satisfying 


the  rather  restrictive  assumptions  made  in  the  theorem  stated  below 


be  given  by  (3.1)  with  W1  = w 7 arbitrary 


THEOREM 


k/litJ/J/l/OC'  vi  & OMiVO  VO  l J 2 

negative  numbers  (possibly  random)  satisfying 


Furthermore , suppose  that  there  exists  a positive  integer  nQ  (possibly 
random)  such  that  for  all  n > n , 0 a'<B'  u (X  . - a ) a'<8'  1.  where 

X . - X . (R  ).  Then  for  all  n > n , 

KMM  KM  M " — r\* 
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ll^ll -i8'  II  II*  i n-vV 

o k*m 

o 


+ max  (bj/d.J'tt  - n (1  - u.d.))  , 

n<k<n  K K j=n  00 

o o 


(4.126) 


where  d^  = - a^.  Furthermore , if  Iv^d-^  a=8'  °»,  and  b^d^  a-+s"  o 

a8  k -*■  <*>,  then  ||  V II  a+8’  0 as  n <*>. 

n 

PROOF.  From  (4.2)  through  (4.7), 

V =v  - y (F  V + Fw  - P ) 
nrl  n n n n no  n 

=*  V - y R V - ii  (F  V + Fw  - P - R V),  (4.127) 

n nxxn  nnn  no  n xx  n 

so  that  for  all  n > n , 

— o 

HVlH  i <l  - “nW«v.#  + “>«  - \J  »V»N 

+ “„H  Vo  - V i*'  O - W II  »J  + «,V  «•“» 


Iterating  (4.128),  for  all  n ^ nQ, 


n n 


II  Vi II  < HII * *  Vn  II  n U - + X n (1  - u.d  )p  d (b  d*A), 

L o k-n  K k-n  j-k+1  J J * * * * 

° ° (4.129) 

Since  all  terms  appearing  in  the  sum  in  (4.129)  are  a.s.  non-negative, 
(4.126)  follows  immediately  from  (4.129)  with  the  aid  of  Lemma  3.  Fur- 
thermore, if  a“8*  00  and  b^d^  a4-s*  0 as  k -*■  »,  (4.129)  and 


Lemma  4 show  that  ll^nll  a+S*  0 as  n -*■  «®. 


Q.E.D. 


In  order  for  the  above  theorem  to  provide  useful  information 

regarding  convergence  rate,  the  sequences  (a  } and  {d  } and  the 

n n 

Integer  nQ  must  be  known.  As  mentioned  in  the  previous  section,  one 
application  of  the  above  theorem  is  to  algorithms  having  the  form  of 
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(3.1)  with  and  given  by  (4.122)  and  (4.123),  respectively. 

For  this  case,  the  results  of  Serfling  ( [58  3 » [59 ] ) presented  in  Section 

IV-A  as  Lemma  5 can  be  applied  to  establish  (4.124)  and  (4.125).  It 

seems  that  such  algorithms  should  converge  with  the  fastest  convergence 

rate  of  any  stochastic  approximation  algorithms  under  consideration, 

since  F -*•  R and  P,  P.  It  seems  likely  that  the  above  theorem 
n xx  k 

can  be  used  to  choose  {y^}  to  maximize  the  convergence  rate  for  such 
algorithms.  An  Important  special  case  of  the  above  theorem  is  the 
(deterministic)  steepest  descent  algorithm. 


D.  Discussion 

In  this  chapter,  new  a.s.  convergence  results  are  developed, 
applied,  and  discussed.  In  Section  IV-A,  the  main  results  of  this 
work  are  developed  and  the  extreme  ease  with  which  these  results 
can  be  applied  at  least  in  the  normal  case  is  illustrated  in 
Section  IV-B.  Indeed,  it  is  shown  that  algorithms  (4.112)  and 
(4.113)  converge  a.s.  to  wq  in  the  normal  case  if  X^,  s^  are 
samples  of  finite  variance  finite  order  autoregressive  moving  average 
processes,  a case  of  great  practical  interest.  Although  these  results 
seem  to  be  the  strongest  convergence  results  yet  obtained  under  the 
weakest  conditions,  there  are  several  open  issues  remaining.  In  practice, 
convergence  rate  is  an  extremely  important  issue.  This  problem  is 
treated  in  Section  IV-C  under  overly  restrictive  conditions.  The  asymp- 
totic distribution  of  V seems  to  be  a topic  of  considerable  theoretical 

n 

interest.  Truncated  algorithms  such  as  (3.41)  as  well  as  algorithms 
using  a random  "gain  sequence"  also  seem  to  be  of  interest.  Practically, 
the  most  important  issue  is  probably  an  analytical  investigation  of  con- 
vergence properties  in  nonstationary  environments. 


V.  SPECIAL  FORMS  OF  DATA  CORRELATION  MATRICES 


In  Chapters  II  through  IV,  the  stochastic  solution  of  the  linear 
equation 

R w - P , (5.1) 

xx  ’ 

where  Rxx  is  a p x p correlation  matrix  and  P is  a p x 1 correlation 
vector  is  considered.  In  case  Rxx  and  P are  known,  the  required 
solution,  wq,  of  (5.1)  can  be  obtained  directly.  This  chapter  is  de- 
voted to  computationally  efficient  techniques  for  solving  (5.1)  when 
Rxx  is  either  a Toeplitz  matrix,  i.e.,  the  ijtfl  element  of  Rxx  is  a 

function  only  of  i- j , or  a "block"  Toeplitz  matrix,  i.e.,  for  p * ML, 

2 

there  are  M L x L submatrices  of  Rxx  arranged  in  a Toeplitz  form. 

The  results  of  this  chapter  are  computational  algorithms  which  require 
far  less  computer  storage  and  computation  time  than  standard  numerical 
techniques  for  solving  (5.1). 

A.  Motivation:  Array  Processing  of  Homogeneous  Fields 

An  important  application  of  linear  filtering  theory  is  to  the 

estimation  of  some  component  of  a scalar-valued  homogeneous  random  field. 

Let  €n(t,x,y,z,)  be  a scalar  homogeneous  random  field  for  n = 1,2,..., 

N.  Let  t denote  time  and  (x,y,z)  denote  spatial  coordinates  in  some 

suitable  cartesian  coordinate  system.  Furthermore,  assume  that  the 

€ (.,.,.,.)  are  zero  mean  and  uncorrelated,  i.e.,  that 
n 

E*5n^tl,Xl,yl*Zl^n/t2’X2,y2'z2*  ' 0 for  a11  n **  ra  and  for  a11  ti*t2' 
xl»x2»yi*y2*zi*z2*  where  the  — denotes  complex  conjugate.  Then  with 

(At,Ax,Ay,Az)  - (t2"tl,X2_Xl’y2‘“yl,Z2-Zl)  and 

N 

X(t,x,y,z)  - l £n<t,x,y,z)  , (5.2) 

n-1 
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the  autocorrelation  function  for  x is  given  by 


px(v  wv = v 


(5.3) 


where  Px(At»Ax»Ay »AZ>  “ Etx^.x^y^z^x^.x^y^z^},  and 

P„(A.-’A,,»A„»A,)  “ E{£  (t. ,x  ,y  ,z.)?  (t_,x.,y  ,z,)}.  From  (5.3), 
ntxyz  nilllnZZZZ 

x(t,x,y,z)  is  a scalar  homogeneous  random  field. 

Suppose  that  there  are  L sensors  located  at  coordinates 
p^  » (x^.y^.z^) (l£i.<L)  and  that  following  the  output  of  each  sensor  is 
a tapped  delay  line  having  M equally  spaced  taps.  Assume  that  all  L 
delay  lines  are  identical  and  have  a time  delay  of  D between  adjacent 
taps.  Define  the  p-element  "data  vector"  (p*ML)  by 

X'(t)  - (x(t,p1),  x(t,P2),...,x(t,pL), 

x(t-D,Pl) , x(t-D.P2) , • • • ,X (t-D,pL) , 


x(t-(M-l)D,p  ) , x(t-(M-l)D,p,) , . . . ,x (t-(M-l)D.p  ) . 

1 2 L(5.4) 

Note  the  data  is  ordered  so  the  first  P elements  correspond  to  data 

Li 

observed  at  the  input  to  the  array  at  time  t,  the  second  P^  elements 
correspond  to  data  observed  at  the  input  to  the  array  at  time  t-D,  and 


Suppose  that  it  is  desired  to  form  a linear  MMSE  estimate  of 


»(t)  * ^(t-d,  pr) 


(5.5) 


at  time  t * kT,  k * 0,1,...,  based  on  the  data  vector  * x(t)lt*jc<f* 
It  is  noted  that  p^  need  lot  correspond  to  one  of  the  physical  sensor 
locations  and  that  d need  not  be  an  Integer  multiple  of  T.  It  is 
assumed  that  D is  an  integer  multiple  of  T.  Denoting  s(t)lt-kT 
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by  sk,  it  is  easily  shown  that  the  desired  estimate  is  given  by 

S,  * w'X,  , where  w is  the  (assumed  unique)  solution  of  R w ■ P, 
k o Tc  o xx 

= EtX^X'^),  and  P - Wts^X^}.  In  order  to  examine  the  special  forms 
of  R.  that  can  arise  in  this  application,  it  is  convenient  to  note 
that  for  m = 1,2,...,  ML(=p),  the  mth  element  of  X(t)  is  given  by 


<X(t»m  = X(t  - qmD,  pm  - q^L)  , 


(5.6) 


where  , and  [•]  denotes  the  largest  integer  part.  Then  the 


ij  element  of  Rxx  is  given  by 


V^h.j  ■ M*<kT  - •>iD'pi-<,iL)’<<kT  - 

- • 

Similarly,  the  mth  element  of  P is  given  by 

<P>m  " (E{8k\})m  “ Ui(kT-d,Pr)x(kT-qmD,Pm_qmL)} 

" pl<d~%P’  Pm^L-V* 


(5.7) 


(5.8) 


where  the  last  equality  follows  from  (5.2)  and  the  uncorrelated 
assumption. 

N 

Now,  some  interpretations  of  the  are  *-n  or<^er*  Suppose 

that  C2»  e.g.,  corresponds  to  "sensor  noise"  which  is  uncorrelated  from 
sensor  to  sensor.  Then 

»2<(^  - V0-  PJ-<IjL  - "l-,/  ' ')2<<'>j-<'l)D-0)^-q:|L,l-qiL- 

(5.9) 

where  3.,.  is  the  Kronecker  delta.  Suppose  further  that  the  remaining 
C are  propagating  plane  waves.  Then 

■ - ’i)D  - tn<Pj-qJL  - "i-qjL*-0'  (5a0) 


(5.10) 
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for  ne{l,3,4, . . . ,N}.  where  x (P  . -p#  ) is  the  propagation  time 

n Jt1  l2 

from  sensor  to  sensor  l 2 for  Cn»  which  is  clearly  a function  of 

the  propagation  velocity,  the  direction  cosines  of  the  propagation  di- 
rection, and  the  distance  between  sensors.  From  (5.7),  (5.3),  (5.9), 
and  (5.10), 


t 


J,  °n((<1J  - ’i)D  - VVY  0) 

n+2 


+ - qi)D,  - 


(5.11) 


From  (5.11),  it  is  easily  seen  that  the  ijtb  element  of  Rxx  is  a 
function  of  only  - q^,  j - q^L,  and  i - q^L.  Hence  for  any  i,j  e{l, 

2 . .  . . ,ML) , and  any  integer  u such  that  i+uLe(l,2, . . . ,ML}  and  j+uLe(l, 

2..  ...ML),  (Rxx>1+uLjj+uL  “ That  l8’  Rxx  Can  be  expre88ed  88 


XX 


“l  °2  °M-1 


-1 


<-al-M 


a0  “l 


“-1  “0  J 


(5.12) 


where  each  a.  is  an  Lx  L matrix.  In  other  words,  R is  a "block" 

X>  XX 

2 

Toeplitz  matrix  consisting  of  M L x L submatrices  arranged  in  a 

Toeplitz  form.  An  immediate  consequence  of  (5.12)  is  that  at  most 
2 2 

(2M  - 1)L  of  the  (ML)  elements  of  R are  distinct.  Furthermore, 

2 

since  Rxx  is  Hermltian,  at  most  (M  - 1)L  + (L  + l)L/2  elements  of 
Rxx  need  be  computed. 

In  obtaining  (5.12),  no  special  array  geometry  is  assumed.  It  is 

not  difficult  to  see  from  (5.11)  that  in  case  the  array  geometry  is  such 

that  p.  , * p,  T is  constant  for  all  i,j  such  that 
J-qjL  Fi-q1L 


gaaagBMBMB 
.Kk 
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(j  “ q^L)  - (i  - q jL)  is  constant,  then  each  of  the  in  (5.12)  is 

a Toeplitz  matrix.  In  this  case,  at  most  (2M  - 1)(2L  - 1)  elements  of 

Rxx  are  distinct.  Since  Rxx  is  Hermitian,  only  (M  - 1)(2L  - 1)  + L 
elements  of  Rxx  need  be  computed. 

Finally,  in  case  L * 1,  (5.12)  implies  that  Rxx  is  a Toeplitz 
matrix  having  at  most  2M  - 1 distinct  elements.  Since  Rxx  is  Her- 
mitian, only  M elements  of  Rxx  need  be  computed.  The  case  L * 1 
reduces  t.ie  filtering  problem  to  the  filtering  of  wlde-sense  stationary 
processes  with  an  FIR  (for  Finite  Impulse  Response;  really,  finite  dura- 
tion unit  pulse  response)  filter,  the  Toeplitz  nature  of  which  has  been 
exploited  in  [63] . 


B.  Toeplitz  R 
— xx 

In  case  R^  is  a Toeplitz  matrix,  several  efficient  algorithms 

are  available  for  either  the  solution  of  (5.1)  or  the  computation  of 

R^.  Levinson  [64]  was  apparently  the  first  to  develop  an  efficient 

algorithm  for  the  solution  of  (5.1)  in  case  Rxx  is  a symmetric  Toeplitz 

matrix.  Siddiqui  [65]  presented  a simplified  solution  for  Rxx  for  the 

more  specialized  case  that  R is  a covariance  matrix  for  a stable 

xx 

wlde-sense  stationary  scalar  discrete  time  autoregressive  process  of 
order  k.  Trench  [66]  obtained  an  algorithm  for  finding  Rxx  requiring 
only  that  Rxx  be  Toeplitz  and  strongly  nonsingular.  Zohar  [67] 
presented  a much  simplified  derivation  for  the  result  of  Trench  [66]. 
Preis  [68]  explicitly  presented  the  algorithm  of  Trench  for  the  case 
that  Rxx  is  symmetric  and  discussed  its  occurrence  in  antenna  problems. 
A Fortran  routine  based  on  [68 ] is  presented  in  [63 ] . 

Zohar  [69]  makes  use  of  the  algorithm  of  Trench  to  solve  a set  of 
Toeplitz  linear  equations.  Markel  and  Gray  [70]  obtain  a similar  result 


HI 


from  a different  viewpoint.  Farden  [71]  makes  use  of  the  techniques 
used  by  Zohar  [69]  to  derive  a more  efficient  algorithm  in  case  is 

Hermi jian  Toeplltz  and  P is  a "Hermitlan  vector.”  A more  general 
version  of  this  latter  result  is  presented  below,  which  provides  efficient 
algorithms  for  the  solution  to  (5.1)  in  case  R is  a Toeplltz  matrix. 

Since  the  techniques  used  in  this  section  are  inherently  related  to 
those  used  by  Zohar  [69],  an  attempt  will  be  made  to  follow  the  same 
notational  conventions.  Greek  letters  are  used  for  scalars,  capital 
letters  for  square  matrices,  and  lower-case  letters  for  column  matrices. 
Subscripts  used  on  matrices  will  denote  the  number  of  elements  in  one 
column  of  the  matrix. 

With  a slight  breach  of  previous  notation,  define  R ■ R , w * w, 

p xx  p 

and  dp  * P in  (5.1).  The  algorithms  developed  here  make  use  of 

Phase  1 of  the  Trench  algorithm  [67]  which  requires  that  Rp  be  strongly 

nonsingular,  i.e.,  that  all  principal  minors  of  Rp  be  nonzero. 

Consequently,  it  will  be  assumed  that  (5.1)  has  been  normalized  so  that 

R has  ones  along  its  main  diagonal.  It  is  noted  that  any  nonsingular 
P 

covariance  matrix  of  Interest  in  the  present  work  is  both  Hermitlan  and 

strongly  nonsingular  (positive  definite  Implies  strongly  nonsingular); 

however,  since  the  results  are  of  more  general  interest,  it  will  be 

assumed  for  now  only  that  R is  Toeplltz  and  strongly  nonsingular. 

P 

Consider  the  system  of  equations  R w = d , where  R is  a p x p 

PPP  P 

Toeplltz  matrix  normalized  such  that  (R^^  ^ * 1,  for  i = 1,2 

Define  the  sequences  {n„)  and  {Y.}  such  that  d'  * (n  »•••  * 

V X.  P p~rl  L X 


\ 

i 


1 


for  p odd  and  d'  = 
P 


(n_ , ...  ,11,  »Y- , 
£ 11 
2 


p even  or  odd,  d^+2  - ^ d^,  Y ) for  i 

l 2 * 2 
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where  [•]  denotes  the  largest  integer  part,  d ■ $ , and  d,  ■ Y, • 

O 1 Jl 


The  Toeplitz  nature  of  Rp  enables  one  to  write  (0<i<p-2) 


i+2 


i+1 


i+1  Ri+1 


Ri+1  ai+l| 


*1+1  1 


(5.13) 


where  the  ~ denotes  the  reversed  ordering  of  the  elements,  e.g.. 


b^+1  - (Bi+i» * * ’ ’ clearly*  (5.13)  may  be  rewritten  as 


i+2 


i+1 


R. 


*1+1 


(5.14) 


Defining  r1+2wi+2  * d1+2  (11135-2) • lc  follows  that 


i+2 


• 

r o ■ 

> 

r 0,  l 

i 

w — 

w. 

m 

0. 

i+2 

i 

i 

.0  . 

4 

-♦i- 

(5.15) 


where  9 . 


n,l±3,  - *;»!•  T 1+3.  - 6iV  and  °1  ls  *”  1x1 

1 9 J l o i 

1 z ,-i 


column  matrix  of  ze^ou.  Defining  B^+2  = Ri+2*  yields 


'0  ' 

r 6,1 

i 

w = 

i+2 

Wi 

+ Bi+2 

°i 

.0  . 

-♦i- 

(5.16) 


-T 
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It  is  not  difficult  to  see  from  (5.17)  that  may  expressed  as 

1 


Bi«  - x»i 


^i+l^l  ***  ^i+l^i 


(gi+l*l 


^i+l^i 


'i+1 


’i+1 


• (5.18) 


Substituting  (5.18)  into  (5.16)  yields 


w 


i+2 


+ xm 


1 

gi+l 


+ Vi1 


'i+1 


(5.19) 


In  order  to  make  use  of  this  result.  Phase  1 of  the  Trench  algorithm 
[67]  can  be  applied: 

Initial  values:  e^  = -a^,  g^  = -b^,  = 1 - a^b^ 

Recursion  of  X,g,e  (l<i<p-2): 


6i  ~ "^i+l^i+l  " ei  ai’  “i  “ '^i+l^i+l  bi  gi  ’ 


. , -1- 

-i 

ei  + 5ixi  *i 

A 

“ixi 

= 

.1 

» gi+i  ■ 

A -1 

L 6iAi  J 

Lgi + uixi  eiJ 

Ai+1  * Xi  " Xi5iwi 


Finally,  Phase  1 of  the  Trench  algorithm  and  (5.19)  may  be  combined 
by  noting  that 

w , = Y-i  (5.20) 


and 


v2  - (1  - a.^) 


-1 


nl  ' alYl 


Y1  ' Vl 


(5.21) 


. .. 
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Note  that  efficient  U3e  of  the  above  result  requires  that  6^,  ei+^, 

8i+l*  and  Xi+1  be  computed  for  a11  i “ l»2....»p-2;  whereas,  w^+2 
given  by  (5.19),  01  - n 1+3  - a^w±,  and  <f>1  - T 1+3  - b^  need  only 


be  computed  for  i * 1,3,5, ... ,p-2  if  p is  odd  and  for  i “ 2,4,..., 

p-2  if  p is  even.  Several  Important  specializations  of  the  above 

result  in  case  R is  Hermitian. 

P 

If  Rp  is  a Hermitian  Toeplitz  matrix  then  * ai+i' 

g1+1  - ei+1,  and  ■>  for  all  i - 0,1 p-2.  Consequently, 

the  above  result  simplifies.  The  simplification  is  summarized  below. 

PROBLEM  FORMULATION:  R w = d , (0<i<p-2) 

P P P 


1+2  a R 

3i+l  i+1 


(T>  i+3.’  di’  Yri+31)’  WP 
l 2 ■*  l 2 J 


Initial  values:  » -a^t  * 1 - | | , 


W1  " V w2  * *1 


\ • Vi 


Yl  - Vi 


Recursive  relations:  Compute  6^,  e^+^,  an<*  f°r  * = 1*2,..., p-2. 

Compute  0 , 4^,  and  wi+2  for  i * 1,3,5, ...  ,p-2  if  p is  odd  and 


for  i * 2,4,6, ... ,p-2  if  p is  even. 


6i  “ ”<ai+l’i+l  ~ ei  ai  ’ 


i + Vi1  *i 

Vi1 
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h*i  - \ - i«ii  V • 


0i  " nfi+3,  “ ai  Wi 
l 2 i 


[' 


— Vf 


Wi  al 


’o  * 

r 

- — 

“a  “ 

Wi+2 

Wi 

♦ xIii  ei . 

\, 

i 

+ vi1 

ei+l 

0 

L 

®i+i 

1 

J 

The  above  results  offer  no  apparent  computational  advantage  (or  disad- 
vantage) over  the  results  of  Zohar  [69].  However,  the  following  results 
do  offer  significant  computational  savings  over  the  results  of  Zohar  [69]. 

Suppose  Rp  is  a Hermitian  Toeplitz  matrix  and  the  elements 
of  dp  satisfy  a Hermitian  symmetry  property,  i.e.,  d^  = d^. 

Then  d-+2  = <nrl+3i,  d[,  (o<i<p-2),  i.e.,  y±  = 

ri+3n 


[^1  1 l—-) 


Consequently,  w^+2  = w^+2  an<*  ^i  = 0i’  Hence,  only  [— ] elements 
of  w^+2  need  to  be  computed  using  the  recursive  relationship  given 
above,  the  remaining  elements  being  obtained  from  the  relationship 

w^+2  = wi+2‘  Making  use  of  these  facts,  the  above  algorithm  for  R 

* — 2 
Hermitian  and  d = d requires  approximately  1.5p  additions  and 

2 

1.5p  multiplications  to  compute  the  desired  solution,  w . This  com- 
2 

pares  with  2p  for  the  case  that  R^  is  Hermitian  and  dp  arbitrary. 


In  case  R , d (and  hence  w ) are  real,  R is  symmetric,  and 
P P P P 


d *■  d , 
P P 


an  even  further  reduction  in  computational  requirements 


results.  For  this  case  the  recursion  for  w^  becomes 


i+2 


0 

>■  j 


+ llh  \ i 


i+1 


'i+1 


(5.22) 


1 


rrn'  ' • ^ ' ■ ' ■ ' *** 
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and  the*  computation  of  a|  In  the  expression  for  0^  may  be 
computed  as 

i/2 

ai  Wi  " ^/Wi*£((ap£  + ^ai^i+l-£ ^ 

£■1 


(5.23) 


for  i even  and 


ai  Wi 


i-1 

2 


L (wiM(ai*£  + (ai)i+l-£)  + (wi^i+l(ai^i+l  (5*24^ 


£-1 


2 2 

for  i odd.  Making  use  of  these  expressions,  this  specialized 

2 2 

algorithm  requires  approximately  1.5p  additions  and  1.25p  multiplica- 
tions. A slightly  different  form  of  (5.22)  can  be  easily  obtained  as 


i+2 


’o* 

9i 

"i 

w. 

+ =— 

e . + e. 

i 

X,  — 6 . 

i i 

i i 

0 

1 

(5.25) 


This  final  expression  (5.25)  is  slightly  more  efficient  than  (5.22).  A 
Fortran  routine  for  this  specialized  algorithm  making  use  of  (5.23)  - 
(5.25)  is  presented  in  [72], 

2 

C.  Having  M L x L Submatrices  Arranged  in  Toeplitz  Form 

In  this  section,  the  solution  of  (5.1)  for  the  general 
situation  with  Rxx  expressed  by  (5.12)  is  given.  An  important  by- 
product of  this  development  will  be  an  efficient  algorithm  for  computing 
R^.  Again,  all  covariance  matrices  of  Interest  are  Hermitlan;  however, 
since  the  results  seem  to  be  of  more  widespread  Interest,  the  Hermitlan 
restriction  will  not  be  made.  As  assumed  in  (5.12),  let  p ■ ML. 
Throughout  this  section,  capital  letters  will  be  used  to  denote  square 
matrices  and  lower  case  letters  will  be  used  to  denote  "vectors"  with 
LxL  matrix  "elements."  Subscripts  on  these  quantities  will  be  used  to 


denote  the  number  of  elements  in  each  column  of  the  matrix.  Greek 


letters  will  be  used  to  denote  L x L matrices.  These  conventions  will 


w,  and  d - P as  in  the  previous 


be  violated  with  R 


section.  The  i x l Identity  matrix  will  be  denoted  by  I 


matrix  with  all  zero  elements  will  be  denoted  by  0 


From  (5.12),  R 


ere  “(M-l)L  ' 1*  2 “M-l"  (M-l)L  l-M"  * 

is  used  to  denote  an  obvious  matrix  operation  similar  to  matrix  tranS' 


(m+l)L 


(m+l)L 


Then  ' B(m+l)L*(»tl)L  yleldS 


and  substituting  into 


Solving  (5.28)  to  obtain  A 


(5.27)  yields 


(m+l)L 


Define  the  "block  exchange  matrix"  E*  by 


I 


Ym 

^mL^m  ®mL  + ^mL^m^mL 

Pre-  and  post-multiplying  (5.33)  by  E*m+i)L 

with  (5.29),  one  obtains  (since  E*  B*  E* 

mL  mL  mL 


J 


(m+l)L 


8m  B e’  . 

in  in  mL 


f .6  B T + f ft  e'T 
mL  m mL  mL  m mL 

B + h ty  g"  h v- 
mL  mL  m mL  mL  m 


VmL 


(5.34) 


Using  techniques  analogous  to  those  used  by  Zohar  [67 ] , it  can  be 
shown  that  all  of  the  elements  of  can  8eaarated  from  6^, 

emL*  ^mL’  Ym’  8mL*  an<*  ^mL*  Denote  the  ijth  L x L "block"  of  a matrix, 
say  A^,  by  j-  From  the  first  equality  in  (5.34),  it  follows 

directly  that 


^(nrt-DL*!,!  ” Bm’ 


(5.35) 


(B(m+l  ^1+1,1  (f«L*i,lBm*  1-i-m’ 


(5.36) 


(B(nrt-l)L*l,i+l  ~ Bn/emL^l,i*  1-1-m » 


(5.37) 


^(m+DL^i+l.j+l  “ ^mL^i.j  + ^mL^mL^i, j ’ <5.38) 


From  the  second  equality  in  (5.34),  it  follows  that 

‘ + • <5-39> 


Combining  (5.38)  and  (5.39)  to  eliminate  (B  ) * 

mL  i»j 

^(m+DL^i+l.j+l  = ^(m+DL^.j  + ^mL^mL^i,  j 


~ ^hmLYm8mL) i , j ’ 


(5.40) 


Equation  (5.40)  is  a recursive  relationship  for  generating  the  elements 
of  ®(m+i)L  starting  with  the  initial  conditions  given  by  (5.35)  - (5.37). 
It  is  important  to  note  that  it  is  the  property  that  B*^  * B 

that  enabled  the  derivation  of  (5.40). 
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In  order  for  the  above  result  to  be  of  practical  use,  recursive 

relationships  for  B,  e ,f  ,y,g  , and  h _ must  be  developed. 

m mL  raL  m mL  mL 

Solving  the  upper  right  equation  of  (5.32)  yields  g"T  * -b"T  B*  or 

mL  mL  mL 


gii  — —I)**  B 

“mL  mL  mL 


(5.41) 


Solving  the  lower  left  equation  of  (5.32),  and  substituting  D _ 

mL 

" BmL  + *VLYmgmL  yie]ds  fimlVo  = “BmL  amL  " fimLYmgmLamL*  S°lving  the 

upper  left  equation  of  (5.32)  for  y g"Ta  _ and  substituting  yields 

m mL  mL 

h ■ -B*  a or 
mL  mL  mL 


h T * -B  T a 
mL  mL  mL 


(5.42) 


The  upper  right  equation  of  (5.28)  is  easily  solved  for 


e'  - -a"  B . 
mL  mL  mL 


(5.43) 


Solving  the  lower  left  of  (5.28)  for  f J a and  substituting 

mL  m o 

A , ” B + f Be'  and  B e'  b =»  I - 8 a yields 
mL  mL  mL  m mL  m mL  mL  L mo 


f . - -B  b _ . 

mL  mL  mL 


(5.44) 


Equations  (5.41)  - (5.44)  can  now  be  used  with  (5.34)  to  derive  recur- 
sive relationships  for  g"T , h _ , e'T , and  f _ . From  (5.41)  and  the 

mL  mL  mL  mL 

first  equality  in  (5.34), 


B B e' 

. m m mL 

8(m+l)L  * ~(a-(m+l) * g B + f _B  e' 

mL  m mL  mL  m mL 


°r  g'('m+l)L  “ (°L,L’  gmL)  " emBm(IL’  CmL)  ’ Where 


c * ot  . , . + b"T  f T 

m -(m+1)  mL  mL 


(5.46) 


From  (5.42)  and  the  first  equality  in  (5.24) 
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where 


.M . P 


(m+l)L 


6 - a + e* . a 

m m-frl  mL  mL 


B B , 

m ni 


(5.47) 


(5.48) 


From  (5.43)  and  the  second  equality  In  (5.34), 


where 


e(m+l)L  = ^emL’  °L,L^  " V^n^mL*  V » 


n = a , , + a"  h _ 
m m+1  mL  mL 


(5.49) 


(5.50) 


Also,  from  (5.44)  and  the  second  equality  in  (5.34), 


(m+l)L 


(5.51) 


where 


a)  = a , ,.v  + g"T  b _ 
m -(m+1)  mL  mL 


(5.52) 


Finally,  equating  the  four  L x L "corner  blocks"  of  (5.34),  and  using 

(5.45)  - (5.52),  B can  be  expressed  as 
m 


B = B , + B .6  . y e ,B  , , 

m m-1  m-1  m-1  m m-1  m-1 


(5.53) 


6 =6  ,(I,  - n -y  ,e  ,&  ,) 

m m-1  L m-l'm-l  m-1  m-1 


(5.55) 


Furthermore,  y c»n  be  expressed  as 
tn 


Y * Y , (I,  +«»  .Bn  ,y  .) 
m m-1  L m-1  m m-l'm-l 


(5.56) 


Now,  consider  the  equation  R Tw  T = d T,  for  m = 1,2,...,M. 

mL  mL  mL 


Recall  that  w . and  d , are  mL  x 1 matrices.  Define  the  L x 1 
mL  mL 


6 n -tY  i = B .6  ,Y 
m m-1  m-1  m-1  m-l'm 


Substituting  (5.54)  into  (5.53)  and  solving  for  Bm  yields 


(5.54) 
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ma 


trix  A by  (A  ).  - (d  , ) , . ....  for  1 - 1,2,..., L.  Recall  that 

in  in  1 inL 


the  block  Toeplltz  nature  of  R T enables  one  to  write 

mL 


(m+l)L 


a*'  -i 

* _ 

D o 

a Ci 

o mL 

mL  mL 

■ 

■ 

b R , 

b".  a 

L mL  mL  J 

_ mL  o _ 

(5.57) 


Using  the  last  expression  for  R^m+1)L  In  (5-57)  it  is  easily  shown 
that 


R(m+1)L  ^W(m+1)L 


w 


mL 


L,1 


mL,l 

A , . - b"  w - 
. m+1  mL  mL  J 


(5.58) 


Defining  T * A - b"  w _ and  premultiplying  both  sides  of  (5.58) 
in  m+l  mL  mL 


b5r  \»+l)L’ 


w — 

(m+l)L 

r 

►J 

i»e-  c 

+ R(m+1)L 

°mL,l 

r 

L L.1J 

m 

(5.59) 


Making  U3e  of  the  second  equality  in  (5.34), 


w 


(m+l)L 


w 

'h 

mL 

mL 

s 

0 , 

+ 

1, 

L L,1  J 

L L J 

y r 

m m 


(5.60) 


Once  initial  conditions  are  found,  the  development  will  be  complete. 


-1 


-1 


From  (5.28)  with  m ■ 1,  it  follows  that  ej^  * -ct^a0  » f1L  * -a0 

and  8^  * ( clq  + ej^a^)  From  (5.32),  with  m = 1,  g^L  * -a_1aQ1  and 


. -1 
h, _ **  -a  a,  . 
1L  o 1 


With  m “ 1,  (5.34)  yields  y^  = f il^I^'l’  so 

that  the  set  of  Initial  conditions  is  complete.  The  complete  algorithm 
is  summarized  below. 


PROBLEM  FORMULATION:  R^w  - 


“ML 


(M-l)L 


b(M-l)L  R(M-l)Lj 


* (o1,a2,...,am),  (l<m<M-l) , 
bmL  ' • • • >a_m) » d±m<M-l), 


ML 


Initial  values:  e'  = -o,a  \ f,T  = -a  , , 

1L  1 o ’ 1L  o-l 


*LL  “ -V^o1’  hlL  “ -a~o\’  31  * (“o  + elL°‘-l) 


Y,  * a *"  + f-T(5,e',  w,T  = a ^d1T, 
1 o 1L  1 1L  1L  o 1L 


w 

1L 

hlL 

W 3 

2L 

0 . 

+ 

I. 

L L,1  J 

L 1L  J 

Vr 


Recursive  relations:  (l<m<M-2), 


e ® a ... v + b"  f , 
m - (m+1 ) mL  mL 


g(m+l)L  " (°L,L’g  mL^  ” em8m(IL,emL^ * 


6 ■ a .,  + e'Ta  , 

m m+1  mL  mL 


(m+l)L 


ro  i 

n *1 

L,L 

L 

h 

f 

L mL  J 

L mLj 

6 6 , 
m m 


n_  * a + a"  h 
m m+1  mL  mL 


- (e’  , 0 _ ) - n y (g"  , IT), 


'(m+l)L  mL*  L,L'  m'm  emL’  L 


T 
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wm  a- (m+1)  + 8mL  ^mL* 


(m+l)L 


f „ 

h 

mL 

mL 

0.  . 

I 

L L,L  J 

L L J 

Y <*>  • 
m m 


8m+l  " 8m(IL  - W.V"1' 


Y»+l  " Y.(IL  + Vm+lVm** 


r«+l  " Am+2  " ^(m+DL  W(m+1)L  * 


(m+2)L 


(m+l)L 


(m+l)L 


L1!- 


Y«+ir»+l 


L,1  J 

Note  that  if  it  is  desired  to  compute  the  Inverse  of  R^,  the  above 

algorithm  can  be  used  with  the  expressions  for  T . and  w,_ . 

•^1  (®t2)L 

deleted.  After  8^.  •(M_1)L.  f(M-l)L*  *(H-1)L»  h(M-l)L*  ,nd  YM-1 
have  been  computed,  equations  (5.35)  - (5.37)  and  (5.40)  with  ■ - M - 1 
can  be  used  to  generste  ®ML‘ 

2 

The  above  algorithm  requires  approximately  6M*L  + 2 ML  storage 

locatiohs,  which  can  represent  a considerable  savings  for  large  M. 

The  algorithm  requires  approximately  4M2  matrix  (L  x L)  multiplica- 
2 

tions,  4M  matrix  (L  x L)  additions,  and  M matrix  (L  x L)  inversions. 

Considering  that  an  L x L matrix  multiplication  requires  approximately 
3 3 

L scalar  additions  and  L scalar  multiplications,  and  that  standard 

3 

routines  for  an  L x L matrix  inversion  require  approximately  L multi- 
plications and  additions,  the  above  algorithm  requires  approximately 

2 3 3 

4M  L operations  compared  with  approximately  (ML)  for  standard 

algorithms.  These  remarks,  of  course,  do  not  include  the  operations 


. . _ ..  


^ ^ . 


, . . , 


95 


necessary  to  actually  compute  which  requires  an  additional 

2 3 

4M  L operations  (approximately).  It  can  certainly  be  concluded  that 

the  above  algorithm  can  offer  a substantial  computational  advantage 

over  standard  algorithms  for  large  M.  Recall  that  M la  the  number  of 

taps  on  a delay  line  realization  of  a FIR  filter. 

As  noted  previously,  all  covariance  matrices  of  Interest  in  this 

work  are  Hermitian  (in  fact,  most  are  real  and  symmetric).  Consequently, 

the  simplification  of  the  above  algorithm  for  Hermitian  block  Toeplitz 

R will  now  be  undertaken, 
xx 

For  the  case  that  R^  is  Hermitian,  it  follows  easily  that 

Y'  = Y > 6'  - 8,  e'  * f ' , and  g"  = h*  . Substituting  these 
in  m m in  mL  mL  tnL  mL 

identities  into  (5.34)  easily  yields 


(nH-l)L 


8 f\ 

m mL 


If  .6  Br  + f .8  f1. 

L mL  m mL  mL  m mL. 


B + h y h*  h ty 

mL  mL  m mL  mL  m 

Y b'T  Y 

. m mL  'm 


(5.61) 


Making  the  required  substitutions  into  the  general  algorithm,  the 
simplified  algorithm  for  the  case  at  hand  is  easily  obtained.  A summary 
of  the  algorithm  for  Hermitian  R^  is  presented  below. 

PROBLEM  FORMULATION:  - d^. 


o (M-l)L 

^ " b P 

. (M-l)L  "(M-l)L 


b^  m (o-^*°2*  • • • * (IfnfM*!), 


w * ? 
WML 


m+1 


(IT  ■ B 6 Y W )"1Btn  , 
l m zn  m m m 


Y«H-1  “ Ym(IL  + “mWmV  * 


rm+l  ” Ab+2  " b(m+l)LW(ort-l)L  * 


W(«+2)L  “ 

» m 

W(m+1)L 
0 , 

+ 

h(m+l)L 

I 

Ym+Irm+1  * 

L L,1  J 

L L J 

| 

In  case  the  Inverse  of  Is  desired,  the  above  algorithm  can  be 

used  with  the  expressions  for  r^t  ^ and  £)L  deleted.  After 

®M-1*  f(M-l)L*  h(M-l)L*  and  YM-1  have  been  comPuted»  equations  (5.34)- 
(5.36)  and  (5.39)  with  m ■ M-l,  e » faL,  and  g^L  ■ hj^  can  be 


9/ 


used  to  generate  B^.  Efficient  use  of  this  procedure  of  course 
demands  that  the  Hermitian  property  of  B be  used.  The  above  algo- 
rithm  requires  approximately  half  the  computations  required  by  the 
previous  algorithm. 


Another  interesting  case  for  block  Toeplitz  matrices  arises  when 
RML  is  persymmetric,  i.e.,  symmetric  about  the  main  cross  diagonal. 
Define  the  exchange  matrix  E by 


(5.62) 


A persymmetric  matrix  A satisfies  E A’E  = A . It  is  easily  shown  that 

m ro  m m 

a block  Toeplitz  matrix  as  given  by  (5.12)  is  persymmetric  if  and  only 
if  afL  is  persymmetric  for  all  l - 0,±1,. . . ,±(M-1) . Hence,  in  case 
oi^  is  Toeplitz  for  all  l » 0,±1, . . . ,±(M-1) , is  persymmetric. 

Also,  it  is  easily  shown  that  the  inverse  of  a persymmetric  matrix  is 


persymmetric.  With  R^  persymmetric  and  given  by  (5.26),  B(m+1)L 
can  still  be  expressed  as  in  (5.29).  Computing  *(.fl)LB;itfl)L«0tfl)L 
from  (5.29),  B^m+2)L  may  exPresse<*  38 


B T + E Te  .e'f’.E  _ E e tB'Et 
mL  mL  mL  m mL  mL  mL  mL  m L 


E.  8 ' f ' . E 
> L in  mL  mL 


E 8'E 
Lm  L 


In  full  analogy  to  the  development  of  (5.35)  - (5.40), 
tB(«frl)L*l,l  “ gm  ’ 

(B(m+l)L)i+l,l  " ^ml^i.lV  I-1-m  * 
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*B(m+l)L*l,i+l  * Bm(emL*l,i  * 1-1-m  ’ 


(5.66) 


^(m+DL^i+l.j+l  ■ ^B(m+l)l?i, j + ^mL^mL^i.J 

- ^EmLemLBmLfmLE mL^ i , j * ‘ (5’67> 

Thus,  in  this  case,  all  of  the  elements  of  B._  can  be  generated  from 

ML 

6m_i»  e(M-l)L*  and  f(M-l)L"  E9uati°ns  (5.43)  and  (5.44),  which  are 

still  valid  for  this  case,  can  be  used  with  (5.63)  to  obtain  recursive 

relationships  for  6 , e . , and  f . As  in  the  previous  section,  it 

will  be  convenient  to  use  the  symbol  ~ to  denote  a reversal  in  the 

vertical  ordering  of  the  rows  of  a matrix,  e.g.,  A^  ■ EmLamL*  an<* 

a'.E  , ■ (E  _a  .)'  ■ A',.  From  (5.43)  and  the  second  equality  in  (5.63), 
mL  mL  mL  mL  mL 


where 


e(m+l)L  ” (emL’  °L,L^  " Sn0m(fmL’  V* 


C - a E + a"  6 . 

m m+1  L mL  mL 


(5.68) 


(5.69) 


From  (5.44)  and  the  second  equality  in  (5.63), 


(m+l)L 


PmL  ' 
L°L,L  . 


g *d»  , 

mTm 


where 


^m  ELa-(nH-l)  + ^mL^mL 


(5.70) 


(5.71) 


Solving  the  upper  left  L x L block  of  (5.63),  and  substituting  (5.68)  - 
(5.71),  one  obtains 


Bm  " 6m-l  + Bm-lSn-lBm^m-lBm-l  ' 


(5.72) 


Similarly,  from  the  lower  left  L x L block  of  (5.63),  one  obtains 


Bm-l^m-lBm  “ Bm**-lBm-l 
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Substituting  (5.72)  into  (5.71), 

6m  ■ (IL  - WAWil  • <5-7‘) 

Finally,  in  order  to  solve  the  equation  R^w^  * d^,  note  that  (5.57) 

through  (5.59)  are  still  valid.  Recall  that  (A  ) - (d  ,).  ..... 

m 1 mL  (m-X;L+i 

* 

for  i *■  1,2,...,L,  and  * A^(  ^ - b”^w  Making  use  of  the  second 
equality  of  (5.63)  and  (5.59),  one  easily  obtains 


(m+l)L 


B'ETr  . 

m L m 


The  following  is  a summary  of  the  algorithm  for  the  solution  of 

RMLWML  " dML*  f°r  the  8Peclal  case  that  Rml  ia  a Psrsy^etric  block 
Toeplltz  matrix. 


PROBLEM  FORMULATION:  - d , 


o (M-l)L 

b(M-l)L  R(M-1)L 


’ “ rml’ 


amL  " (0,ra2”--’am)’  • 

bmL  “ (a-l*a-2*  * * ’ *°-m)  ’ 1-m-M"1  ’ 


w ■ ? 

ML 

Initial  values:  e'  * -a, a \ B,  * (a  + e'  a ,)  * , 

XL  X O X O XL  —X 

flL  = "°o  W1L  * “o  dlL’  ri  * A2  ~ “-1W1L  ’ 


W1L  1 PlL 

+ Wi  * 

ViJ  KJ 


- 
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Recursive  relations:  (l<m<M-2) 


C “ a,.ET  + a"r6  r » 
m m+1  L mL  mL 


e(m+l)L  ” (emL’  °L,L*  " SnSm(fmL’  A ' 


4>  ■ ET  a , . , x + f ' T b . , 

m L -(m+1)  mL  mL 


(m+l)L 


rf  i 

’ft 

mL 

mL 

- °L , 1 . 

A. 

m m 


8.1  - (IT  - 8 C 8 . 

m+l  L m m m m m 


r b a — h*'  y 
m+1  m+2  (m+l)L  (m+l)L  ’ 


(m+2)L 


(m+l)L 


L L,1 


(m+l)L 


L L 


em+lELrm+l 


If  the  inverse  of  is  desired,  the  above  algorithm  can  be  used  with 

the  expressions  for  w , and  T deleted.  After  f5w  , , e,„  , XT,  and 

inL  m M-l  (M-ljL 

f(M  have  been  computed,  equation  (5.67)  can  be  used  with  m ■ M-l 

to  generate  * B^,  using  (5.64)  - (5.66)  as  initial  conditions. 

The  computational  requirements  of  the  above  algorithm  are  virtually 
identical  with  those  of  the  previous  algorithm  for  Hermitian  block 
Toeplltz  R^.  The  reason  for  this  similarity  is  clear:  a Hermitian 

matrix  has  conjugate  symmetry  about  the  main  diagonal,  and  a persymmetric 
matrix  has  symmetry  about  the  main  cross  diagonal. 

Finally,  in  case  R^  is  a Hermitian,  persymmetric  block  Toeplitz 
matrix,  the  computational  requirements  of  the  above  algorithm  can  be 
approximately  halved.  The  simplification  is  easily  obtained  by 
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substituting  a",  * b'¥ , f _ “ e ¥ , and  $ m c'  into  the  above 
mL  tnL  mL  nL  in  ni 


algorithm.  The  resulting  algorithm  Is  summarized  below. 
PROBLEM  FORMULATION:  - d^, 

o b(M-l)L 


Lb(M-l)L  R(M-l)Lj 


’ ” RML  * 


bmL  * (a1»a2,',*,0lm^  ’ » 


U JB  ? 

ML 


Initial  values:  e’  = -a, a \ 8.  *■  (a  + e’a')  1 , 

1L  1 o 1 o 1L  1 ' 


W1L  = “o^lL  ’ F1  = A2  - “iWlL  • 


W, 


2L 


V 

1L 

®1L 

m 

+ 

.°L,1. 

A . 

Wl 


Recursive  relations:  (l<m<M-2) 


C * a . F„  + b'  6 > 

m m+1  L mL  mL 


:(m+l)L  " (emL’  °L,I^  “ CmPm  (emL’  V ’ 


6 +1  = (I.  - BmC  B'Ct’)"1Brn  , 
m+l  L m m m m m 


rm+l  “ Am+2  _ b(m+l)LW(m+l)L  » 


• 

W(m+1)L' 

fi(m+l)L 

W(m+2)L  " 

0 , 

+ 

E„ 

L L,1  J 

L 

6m+lELrm+l 


- 
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D.  Discussion 

In  this  chapter,  special  forms  of  data  correlation  matrices  which 
can  arise  in  discrete-time  stochastic  signal  processing  applications 
have  been  considered.  Computationally  efficient  algorithms  have  been 
presented  for  the  solution  of  R w * P as  well  as  for  obtaining  R ^ 

XX  xx 

for  these  special  forms.  These  results,  the  development  of  which  relies 
heavily  on  generalizations  of  the  results  of  Zohar  [67],  [69],  are  of 
interest  in  their  own  right.  The  application  of  these  results  to 
filter  design  problems  for  which  Rxx  and  P are  known  is  straight- 
forward . 

In  case  Rxx  and/or  P are  unknown  and  a fixed  (nonadaptive) 
filter  is  desired,  the  results  of  this  chapter  are  still  applicable. 

The  obvious  approach  is  to  use  estimates  of  Rxx  and/or  P.  An 
alternative  approach  to  the  design  of  FIR  filters,  with  the  signal  and 
noise  structure  of  Example  1 In  Section  IV-B,  involves  the  utilization 
of  "approximate"  spectral  density  functions,  and  has  been  treated  by 
Farden  and  Scharf  [63].  Extensions  of  the  concepts  treated  in  [63] 
to  the  design  of  multidimensional  FIR  filters  can  be  accomplished  with 
the  aid  of  Section  V-C. 

The  results  of  this  chapter  are  also  useful  for  performing 
simulations  of  adaptive  structures.  In  performing  such  simulations,  it 
is  advantageous  to  generate  data  having  known  covariance  functions  in 
order  to  evaluate  the  performance  of  the  adaptive  processor  by  computing, 
e.g.,  || W.  - w ||  with  W,  obtained  from  some  form  of  (3.1)  and  w 

K O K O 

the  solution  to  Rxxw  = p*  The  results  of  this  chapter  are  ideally 

suited  for  the  computation  of  wq  for  several  cases  of  practical 

interest,  as  discussed  in  Section  V-A. 
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Finally,  the  results  of  Section  V-A  suggest  modifications  of  the 
algorithms  discussed  in  Chapter  II  which  should  result  in  an  Increased 
convergence  rate  without  a severe  increase  in  storage  requirements. 
Consider  the  algorithm 


Vi  = wk  + k(p  - W 


(5.76) 


where  » and  X^  is  a p-element  data  vector  as  in  Example  1 

of  Section  IV-B.  Under  conditions  established  in  Chapter  IV, 

W,  -*•  * w as  k -*■  00 , and  hence,  Y -*•  * s as  in  Example  1.  In 

K O K K 

order  to  implement  algorithm  (5.76),  the  only  storage  needed  is  for 
Wk,P,  V Y^,  and  1/k,  or  3p  + 2 words.  This  small  storage  require- 
ment (as  well  as  the  minimal  computational  requirement)  is  indeed  a 
practical  advantage.  In  many  applications,  convergence  rate  is  an 
extremely  important  issue.  One  would  certainly  expect  an  algorithm  of 
the  form 


“v  + T <P  - K"1  l VI  v 


*=k-K+l 


(5.77) 


for  any  integer  K > 1 to  converge  faster  than  (5.76).  Note  that  in 
order  to  implement  (5.77),  the  p x p matrix 


i k 

l hK 

d*k-K+l 


(5.78) 


must  be  computed  and  stored.  For  large  P,  the  storage  requirement  alone 
can  preclude  the  use  of  algorithms  such  as  (5.77).  For  the  case  being 


considered,  Rxx  is  a Toeplitz  matrix  having  only  p distinct  elements. 
Consequently,  one  is  led  to  consider  algorithms  of  the  form 


— 
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k+1 


Wk  + k(P  - FSV 


(5.79) 


where  F*  is  an  unbiased  estimate  of  R and  constrained  to  be 
k xx 

Toeplitz,  e.g.,  consider  F*  with  the  ij1"*1  element  given  by 

i-i  I 

; <Vt<Vt+|j-l|  ' (5-80) 


. P- I i-i I 

(Fk}i  1 = (p  “ IJ-1!)  l 

•J  £. 


The  idea  here  is  that  the  ijC^  element  of  F*  is  the  average  of  all 
terms  on  the  | i-j  | diagonal  of  X^X^.  Clearly,  E{F*}  ■ Rxx  and 
F*  is  Toeplitz.  An  obvious  alternative  to  (5.79)  with  F*  given  by 
(5.80)  is 


W, 


k+1 


\ + t<p  - K 


-1 


i nv  • 


H=k-K+1 


(5.81) 


with  F*  given  by  (5.80).  Algorithms  (5.80)  and  (5.81)  can  be 
implemented  with  virtually  no  increase  in  storage  requirements  over 
(5.76).  It  seems  reasonable  to  conclude  that  algorithms  such  as  (5.80) 
and  (5.81)  are  viable  alternatives  to  (5.77)  in  cases  where  storage 
requirements  are  an  important  issue  and  algorithm  (5.76)  converges  too 
slowly  to  be  of  interest.  Extensions  of  algorithms  such  as  (5.80)  and 
(5.81)  for  block  Toeplitz  R are  immediate.  It  is  obvious  that 

XX 

additional  analytical  work  on  the  issue  of  convergence  rate  is  necessary 


to  evaluate  the  above  remarks. 


VI . CONCLUSION 


In  this  work,  new  almost  sure  convergence  results  for  a special 
form  of  the  multidimensional  Robblns-Monro  stochastic  approximation 
procedure  are  given.  The  form  treated  has  been  motivated  by  adaptive 
signal  processing  applications.  Several  types  of  data  correlation 
matrices  (e.g.,  Toeplltz  and  "block"  Toeplltz)  have  been  examined  and 
new  computationally  efficient  procedures  have  been  given  for  both  the 
inversion  of  a matrix  having  this  special  form  and  for  solving  a cor- 
responding set  of  simultaneous  linear  equations.  In  this  chapter,  these 
new  results  are  summarized,  and  suggestions  for  future  work  are 
presented . 

A.  Summary  of  New  Results 

The  new  convergence  results  of  this  work,  presented  in  Chapter  IV, 
are  applicable  to  any  algorithm  that  may  be  cast  into  the  form  of 
equation  (3.1).  It  is  shown  in  Chapter  II  that  this  particular  form 
is  applicable  to  many  of  the  algorithms  that  have  been  proposed  for 
adaptive  signal  processing  application.  Although  many  proposed  algo- 
rithms make  use  of  a constant  gain  sequence,  l.e.,  p^  ~ u,  it  is 
pointed  out  In  Section  III-B  that  in  order  for  these  algorithms  to  be 
asymptotically  unbiased  when  used  with  correlated  data,  the  condition 
that  p^  -*■  0 is  essential.  The  theorem  which  Is  stated  and  proved  In 
Section  IV-A  transforms  the  convergence  problem  from  consideration  of 
the  a.s.  convergence  of  a stochastic  difference  equation  to  the  a. a. 
convergence  of  several  stochastic  sequences.  Corollary  2 of  Section 
IV-A  provides  sufficient  conditions  on  the  decay  ratea  of  the  auto- 
covariance  functions  of  the  sequences  {F.  } and  {C^}  to 
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establish  the  conditions  of  the  theorem.  In  Section  IV-B  the  results 
of  Corollary  2 are  applied  to  several  specific  algorithms  that  have 
been  proposed  for  adaptive  signal  processing.  In  particular,  in  the 


normal  case  with  - x^x^»  ^ * sk\  °r  Pk  " P * E^SkXk^* 


and 


-1. 


uk  * 0(k  ),  lim  ky  > 0,  and  given  by  (3.1),  if  {sk},  (Xj)  are 

k-*» 

jointly  wide-sense  stationary  and  all  scalar  correlation  functions 

1/2 

y(u)  which  can  be  computed  for  {s.},  {X  } satisfy  lim  u |y(u)|<». 


I I d • S 

then  IVjJ  -+•  * 0 as  k -*■  ®.  For  example,  if  {s^}  an<*  (X^}  are 
finite-order  autoregressive  moving  average  processes,  or  can  be  viewed 
as  samples  of  strictly  bandlimlted  continuous  time  processes,  then 
|vfc|  a4-8*  o as  k -*■  “.  Furthermore,  even  in  the  non-normal  case,  if 

{Fk>  and  { P^)  are  M-dependent  and  E{  || Fk|| , q>2,  is  bounded,  then 

| V | a-i8’  o as  k -*•  • for  suitable  {y.}. 


U-KD 


'k 


In  Chapter  V,  special  forms  of  data  correlation  matrices,  Rxx, 


that  are  shown  to  arise  in  discrete  time  signal  processing  applications 

are  examined.  New  computationally  efficient  procedures  are  developed 

for  both  the  computation  of  Rx^  and  the  solution  of  Rxxw  ” P in 

case  Rxx  is  Toeplltz  or  block  Toeplltz.  The  new  procedures  can 

result  in  a significant  savings  in  storage  requirements  and  computation 

time  over  standard  solution  techniques.  For  example,  when  Rxx  is  an 

2 

ML  x ML  symmetric  matrix  having  M L x L submatrices  arranged  in 

Toeplltz  form,  the  appropriate  new  procedure  for  the  solution  of 

2 3 

Rx^w  ■ P requires  approximately  2M  L operations  compared  with 

3 

approximately  (ML)  /3  for  standard  algorithms. 

B.  Suggestions  for  Future  Work 

Although  the  new  convergence  results  presented  in  Chapter  IV  for 
algorithms  of  the  form  (3.1)  seem  to  be  the  strongest  convergence 


MM fetal 
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results  yet  obtained  under  the  weakest  conditions,  there  are  several 
Important  Issues  remaining.  The  results  of  Chapter  IV  apply  to  algo- 
rithms of  the  form  of  (3.1)  with  E{F^}  symmetric  and  positive 
definite.  Extensions  to  E{F^}  nonsymmetrlc  and  positive  definite 
are  straightforward  In  view  of  Theorem  6.1  of  [57].  Algorithms  for 
which  a.s.  convergence  was  explicitly  developed  In  Chapter  IV  can  be 
Interpreted  as  stochastic  gradient-following  algorithms,  such  as  pro- 
posed by  Widrow  et  al.  [37],  Griffiths  [38],  and  Gersho  [9].  Although 
stochastic  projected  gradient  algorithms  such  as  proposed  by  Lacoss  [32] 
and  Frost  [33]  can  be  cast  into  the  form  of  (3.1),  EiF^)  Is  only  posi- 
tive semldef lnite.  It  Is  the  author's  opinion  that  the  results  of 
Chapter  IV  can  be  easily  extended  to  the  analysis  of  these  stochastic 
projected  gradient  algorithms. 

An  extremely  important  issue  that  warrants  serious  analytical 
treatment  is  the  issue  of  convergence  rate  and  the  tradeoffs  involved 
between  convergence  rate  and  computational  requirements.  In  this 
regard,  a treatment  of  truncated  algorithms  such  as  (3.41),  algorithms 
which  use  a data-dependent  gain  sequence  {p^K  and  decision-directed 
and  decision  feedback  strategies  would  certainly  seem  to  be  of  great 
Interest.  Other  areas  that  merit  additional  work  include  (1)  the 
effects  of  quantization  errors  on  the  convergence  properties  of  these 
algorithms,  (il)  strategies  for  use  in  nonstationary  environs,  and 
(ill)  the  asymptotic  distribution  of  the  "weight  vector"  for  algorithms 
used  with  correlated  training  data. 

Finally,  the  new  results  obtained  in  this  work  have  applications 


to  areas  outside  the  realm  of  the  adaptive  signal  processing  schemes 
discussed  in  Chapter  II.  For  example,  the  algorithms  proposed  by 


Sarldis  and  Stein  [73],  and  Graupe  and  Perl  [74]  for  the  identification 
of  systems  fall  directly  into  the  framework  of  the  new  convergence 
results  treated  in  Chapter  IV.  In  fact,  the  results  of  Chapter  IV 
provide  analytical  justification  for  an  even  broader  family  of  algo- 
rithms than  proposed  in  [73]  and  [74].  The  new  results  of  Chapter  V 
are  also  applicable  to  the  system  identification  problem. 


d 
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»•.  SUPPLEMENTARY  NOTES 


•I.  KEY  WOROS  (Continue  an  reverie  elde  II  neceeenry  and  Identity  by  block  number) 

Stochastic  approximation,  Robbins-Monro  procedure,  correlated  data, 
time  series,  adaptive  signal  processing,  moment  conditions,  autoco- 
variance decay  rate  conditions,  adaptive  filters,  adaptive  arrays, 
minimum  mean  square  error  filter,  FIR  filter,  Toeplitz  matrix,  block 

lp.epLUa„. matrix 

*0.  ABSTRACT  (Continue  on  reverie  el  do  H neceeiory  end  Identity  bp  bleck  number) 

New  almost  sure  convergence  results  for  a special  form  of  the  multi- 
dimensional Robbins-Monro  stochastic  approximation  procedure  are 
developed.  The  special  form  treated  is  motivated  by  a consideration 
of  several  algorithms  that  have  been  proposed  for  discrete  time  adaptive 
signal  processing  applications.  Most  of  these  algorithms  can  also  be 
viewed  as  stochastic  gradient-following  algorithms. 
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Essentially,  previous  convergence  results  contain  a common 
"conditional  expectation  condition"  which  is  extremely  difficult  (if 
not  impossible)  to  satisfy  when  the  "training  data"  is  a correlated 
sequence.  In  contrast,  the  new  convergence  results  developed  in  the 
present  work  are  easily  applied  to  cases  where  the  "training  data" 
is  heavily  correlated.  In  fact,  the  new  convergence  results  are 
applicable  when  certain  moments  exist  and  certain  "decay  rates"  on 
two  autocovariance  functions  can  be  established.  For  example,  when 
the  data  sequence  is  normal  and  (i)  M-dependent,  (ii)  autoregressive 
moving  average  (ARMA) , or  (iii)  can  be  viewed  as  samples  of  a band- 
limited  continuous  time  process,  the  new  convergence  results  can  be 
applied  to  establish  the  almost  sure  convergence  of  each  algorithm 
treated. 

Several  special  forms  of  data  correlation  matrices  that  are  shown  to 
arise  in  descrete  time  signal  processing  are  examined.  New 
computationally  efficient  procedures  are  developed  for  both  the 
inversion  of  a matrix  having  one  of  the  treated  special  forms  and 
for  the  solution  of  a corresponding  set  of  simultaneous  linear 
equations.  The  special  forms  treated  are  termed  Toeplitz  and  block 
Toeplitz  matrices. 
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